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Preface 



This volume contains the proceedings of the 12th International Conference on 
Computer Aided Verification (CAV 2000) held in Chicago, Illinois, USA during 
15-19 July 2000. 

The CAV conferences are devoted to the advancement of the theory and 
practice of formal methods for hardware and software verification. The confe- 
rence covers the spectrum from theoretical foundations to concrete applications, 
with an emphasis on verification algorithms, methods, and tools together with 
techniques for their implementation. The conference has traditionally drawn 
contributions from both researchers and practitioners in academia and industry. 
This year 91 regular research papers were submitted out of which 35 were ac- 
cepted, while 14 brief tool papers were submitted, out of which 9 were accepted 
for presentation. CAV included two invited talks and a panel discussion. CAV 
also included a tutorial day with two invited tutorials. 

Many industrial companies have shown a serious interest in CAV, ranging 
from using the presented technologies in their business to developing and mar- 
keting their own formal verification tools. We are very proud of the support 
we receive from industry. CAV 2000 was sponsored by a number of generous 
and forward-looking companies and organizations including: Cadence Design Sy- 
stems, IBM Research, Intel, Lucent Technologies, Mentor Graphics, the Minerva 
Center for Verification of Reactive Systems, Siemens, and Synopsys. 

The CAV conference was founded by its Steering Committee: Edmund Clarke 
(CMU), Bob Kurshan (Bell Labs), Amir Pnueli (Weizmann), and Joseph Sifakis 
(Verimag) . 

The conference program for this year’s CAV 2000 was selected by the pro- 
gram committee: Parosh Abdulla (Uppsala), Rajeev Alur (U. Penn and Bell 
Labs), Henrik Reif Andersen (ITU Copenhagen), Ed Brinksma (Twente), Randy 
Bryant (CMU), Werner Damm (Oldenburg), David Dill (Stanford), E. Allen 
Emerson, co-chair (U. Texas- Austin) , Steven German (IBM), Rob Gerth (Intel), 
Patrice Godefroid (Bell Labs), Ganesh Gopalakrishnan (U. Utah), Mike Gor- 
don (Cambridge), Nicolas Halbwachs (Verimag), Warren Hunt (IBM), Bengt 
Jonsson (Uppsala), Kim Larsen (Aalborg), Ken McMillan (Cadence), John Mit- 
chell (Stanford), Doron Peled (Bell Labs), Carl Pixley (Motorola), Amir Pnueli 
(Weizmann), Bill Roscoe (Oxford), Joseph Sifakis (Verimag), A. Prasad Sistla, 
co-chair (U. Illinois-Chicago), Fabio Somenzi (U. Colorado), and Pierre Wolper 
(Liege). 

We are grateful to the following additional reviewers who aided the reviewing 
process: Will Adams, Nina Amla, Flemming Andersen, Tamarah Arons, Eu- 
gene Asarin, Mohammad Awedh, Adnan Aziz, Clark Barrett, Gerd Behrmann, 
Wendy Belluomini, Michael Benedikt, Saddek Bensalem, Ritwik Bhattacharya, 
Tom Bienmueller, Per Bjesse, Roderick Bloem, Juergen Bohn, Bernard Boigelot, 
Ahmed Bouajjani, Olivier Bournez, Marius Bozga, P. Broadfoot, Udo Brock- 
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meyer, Glenn Bruns, Annette Bunker, Paul Caspi, Prosenjit Chatterjee, Hubert 
Common, Jordi Cortadella, Sadie Creese, David Cyrluk, Pedro D’Argenio, Sa- 
tyaki Das, Luca de Alfaro, Willem-P. de Roever, Juergen Dingel, Dan DuVarney, 
Joost Engelfriet, Kousha Etessami, David Fink, Dana Fisman, Martin Fraenzle, 
Laurent Fribourg, Malay Ganai, Vijay Garg, Jens Chr. Godskesen, Jeff Golden, 
M. H. Goldsmith, Guarishankar Govindaraju, Susanne Graf, Radu Grosu, Aarti 
Gupta, Dilian Gurov, John Havlicek, Nevin Heinze, Holger Hermanns, Thomas 
Hildebrandt, Pei-Hsin Ho, Holger Hermanns, Ravi Hosabettu, Jae- Young Jang 
Henrik Hulgaard, Thomas Hune, Hardi Hungar, Anna Ingolfsdottir, Norris Ip, 
Purushothaman Iyer, Hans Jacobson, Damir Jamsek, Jae- Young Jang, Henrik 
Ejersbo Jensen, Somesh Jha, Michael Jones, Bernhard Josko, Vineet Kahlon, 
Joost-Pieter Katoen, Yonit Kesten, Nils Klarlund, Josva Kleist, Kare Jelling 
Kristoffersen, Andreas Kuehlmann, Robert P. Kurshan, Yassine Lakhnech, Rom 
Langerak, Salvatore La Torre, Ranko Lazic, Jakob Lichtenberg, Orna Lichten- 
stein, Jorn Lind-Nielsen, Hans Henrik Lpvengreen, Enrico Macii, Angelika Ma- 
der, Oded Maler, Pete Manolios, Monica Marcus, Abdelillah Mokkedem, Faron 
Moller, Jesper Moller, Oliver Moller, In-Ho Moon, Laurent Mounier, Chris My- 
ers, Luay Nakhleh, Kedar Namjoshi, Tom Newcomb, Flemming Nielson, Kasper 
Overgard Nielsen, Marcus Nilsson, Thomas Noll, David Nowak, Aletta Nylen, 
Manish Pandey, George Pappas, Atanas Parashkevov, Abelardo Pardo, Cathe- 
rine Parent-Vigouroux, David Park, Justin Pearson, Paul Pettersson, Nir Piter- 
man, Carlos Puchol, Shaz Qadeer, Stefano Quer, Theis Rauhe, Antoine Rauzy, 
Kavita Ravi, Judi Romijn, Sitvanit Ruah, Theo Ruys, Jun Sawada, Alper Sen, 
Peter Sestoft, Ali Sezgin, Elad Shahar, Ofer Shtrichman, Arne Skou, Uli Stern, 
Kanna Shimizu, Scott D. Stoller, Ian Sutherland, Richard Treffer, Jan Tret- 
mans, Stavros Tripakis, Annti Valmari, Helmut Vieth, Sergei Vorobyov, Bow- 
Yaw Wang, Farn Wang, Poul F. Williams, Chris Wilson, Hanno Wupper, Jason 
Yang, Wang Yi, Tsay Yih-Kuen, Sergio Yovine, and Jun Yuan. 

Finally, we would like to give our special thanks to John Havlicek for his 
enormous assistance overall including maintaining the CAV web site, the cav2k 
account, and in preparing the proceedings. We appreciate the assistance of the 
UTCS computer support staff, especially John Chambers. We are also most 
grateful, to Richard Gerber for kindly lending us his “START” conference ma- 
nagement software as well as his prompt assistance when a file server error 
masqueraded as a web server error. 
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Keynote Address 



Abstraction, Composition, Symmetry, and a 
Little Deduction: 

The Remedies to State Explosion 



Amir Pnueli 

Faculty of Mathematics and Computer Science 
The Weizmann Institute of Science 
76100 Rehovot, Israel 



Abstract. In this talk, we will consider possible remedies to the State 
Explosion problem, enabling the verification of large designs. All of these 
require some user interaction and cannot be done in a fully automatic 
manner. We will explore the tradeoffs and connections between the dif- 
ferent approaches, such as deduction and abstraction, searching for the 
most natural and convenient mode of user interaction, and speculate ab- 
out useful additional measures of automation which can make the task 
of user supervision even simpler. 
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Invited Address: 



Applying Formal Methods to Cryptographic 
Protocol Analysis 



Catherine Meadows 

Naval Research Laboratory 
Washington, DC 20375 



Abstract. Protocols using encryption to communicate securely and pri- 
vately are essential to the protection of our infrastructure. However, since 
they must be designed to work even under the most hostile conditions, it 
is not easy to design them correctly. As a matter of fact, it is possible for 
such protocols to be incorrect even if the cryptographic algorithms they 
use work perfectly. Thus, over the last few years there has been consi- 
derable interest in applying formal methods to the problem of verifying 
that these protocols are correct. In this talk we give a brief history of 
this area, and describe some of the emerging issues and new research 
problems. 
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Invited Tutorial: 



Boolean Satisfiability Algorithms and 
Applications in Electronic Design Automation 

Joao Marques-Silva^ and Karem Sakallah^ 

^ Institute de Engenharia de Sistemas e Computadores (INESC) 

R. Alves Redol, 9 
1000-029 Lisboa, Portugal 

^ Electrical Engineering and Computer Science Department 
Advanced Computer Architecture Laboratory (ACAL) 

The University of Michigan 
Ann Arbor, Michigan 48109-2122 



Abstract. Boolean Satishability (SAT) is often used as the underly- 
ing model for a significant and increasing number of applications in El- 
ectronic Design Automation (EDA) as well as in many other fields of 
Computer Science and Engineering. In recent years, new and efficient 
algorithms for SAT have been developed, allowing much larger problem 
instances to be solved. SAT ’’packages” are currently expected to have an 
impact on EDA applications similar to that of BDD packages since their 
introduction more than a decade ago. This tutorial paper is aimed at 
introducing the EDA professional to the Boolean satisfiability problem. 
Specifically, we highlight the use of SAT models to formulate a num- 
ber of EDA problems in such diverse areas as test pattern generation, 
circuit delay computation, logic optimization, combinational equivalence 
checking, bounded model checking and functional test vector generation, 
among others. In addition, we provide an overview of the algorithmic 
techniques commonly used for solving SAT, including those that have 
seen widespread use in specific EDA applications. We categorize these 
algorithmic techniques, indicating which have been shown to be best 
suited for which tasks. 



E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, p. 3, 2000. 
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Invited Tutorial: 



Verification of Infinite-State and Parameterized 

Systems 



Parosh Aziz Abdulla and Bengt Jonsson 



Department of Computer Systems 
Uppsala University 
Uppsala, Sweden 



Abstract. Over the last few years there has been an increasing research 
effort directed towards the automatic verification of inhnite-state sy- 
stems. There are now verification techniques for many classes of inhnite- 
state systems, including timed and hybrid automata, petri nets, push- 
down systems, systems with FIFO channels, systems with a simple tre- 
atment of data, etc. In this tutorial, we will cover general verihcation 
techniques that have been used for inhnite-state and parameterized sy- 
stems, and try to show their power and limitations. Such techniques are 
e.g., symbolic model-checking techniques, abstraction, induction over the 
networks structure, widening, and automata-based techniques. We will 
focus on linear-time safety and liveness properties. 
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An Abstraction Algorithm for the Verification of 
Generalized C-Slow Designs 



Jason Baumgartner^, Anson Tripp^, Adnan Aziz^, Vigyan Singhal^, and 

Flemming Andersen^ 

^ IBM Corporation, Austin, Texas 78758, USA, 

{ jasonb.ajt ,f anders}@austin. ibm. com 
^ The University of Texas, Austin, Texas 78712, USA, 
adnEuiSece . utexas . edu 

® Tempus Fugit, Inc., Albany, California 94706, USA 
vigyanShome . com 



Abstract. A c-slow netlist N is one which may be retimed to another 
netlist N , where the number of latches along each wire of A is a mul- 
tiple of c. Leiserson and Saxe [1, page 54] have shown that by increasing 
c (a process termed slowdown) and retiming, any design may be made 
systolic, thereby dramatically decreasing its cycle time. In this paper we 
develop a new fully-automated abstraction algorithm applicable to the 
verihcation of generalized c-slow flip-flop based netlists; the more gene- 
ralized topology accepted by our approach allows applicability to a fairly 
large class of pipelined netlists. This abstraction reduces the number of 
state variables and divides the diameter of the model by c; intuitively, 
it folds the state space of the design modulo c. We study the reachable 
state space of both the original and reduced netlists, and establish a c- 
slow bisimulation relation between the two. We demonstrate how CTL* 
model checking may be preserved through the abstraction for a useful 
fragment of CTL* formulae. Experiments with two components of IBM’s 
Gigahertz Processor demonstrate the effectiveness of this abstraction al- 
gorithm. 



1 Introduction 

Leiserson and Saxe [1,2] have defined a c-slow netlist N as one which is retiming 
equivalent to another netlist N' , where the number of latches along each wire of 
N' is a multiple of c. Netlist N' may be viewed as having c equivalence classes of 
latches; latches in class i may only fan out to latches in class (*+ 1) mod c. Each 
equivalence class of latches of N' contains data from an independent stream of 
execution, and data from two or more independent streams may never arrive at 
any netlist element concurrently. They demonstrate that designs may be made 
systolic through slowdown (increasing c), and how this process dramatically be- 
nefits the cycle time of such designs. We have observed at IBM a widespread use 
of such pipelining through slowdown, for example, in control logic which routes 
tokens to and from table-based logic (e.g., caches and instruction dispatch logic). 
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This use has been hand-crafted during the design development, not achieved via 
a synthesis tool. 

The purpose of our research is to develop a sound and complete abstraction 
algorithm for the verification of a generalized class of flip-flop (FF) based c-slow 
designs, which eliminates all but one equivalence class of FFs and reduces the 
diameter of the model by a factor of c. To the best of our knowledge, this paper is 
the first to exploit the structure of c-slow designs for enhanced verification. Our 
motivating example was an intricate five-stage pipelined control netlist with 
feedback, and with an asynchronous interrupt to every stage. This interrupt 
input prevents the classification of this design as c-slow by the definition in [2], 
but our generalization of that definition enables us to classify and perform a 
c-slow abstraction upon this example. Due to its complexity and size, model 
checking this netlist required enormous computational complexity even after 
considerable manual abstraction. 

Retiming itself is insufficient to achieve the results of c-slow abstraction. 
Retiming is not guaranteed to reduce the diameter of the model, and does not 
change the number of latches along a directed cycle. It should also be noted 
that a good BDD ordering cannot achieve the benefits of this abstraction. For 
example, assume that we have two netlists, N and N', where N is equivalent 
to N' except that the FF count along each wire of N' is multiplied by c. One 
may hypothesize that a BDD ordering which groups the FFs along each wire 
together may bound the BDD size for the reachable set of N to within a factor 
of c of that of N'; however, we have found counterexamples to this hypothesis. 
A better ordering groups all variables of a given class together, as the design 
could be viewed as comprising c independent machines in parallel. However, 
such ordering will not reduce the number of state variables nor the diameter of 
the model. 

A related class of research has considered the transformation of level-sensitive 
latch-based netlists to simpler edge-sensitive FF-based ones. (Refer to [3] for a 
behavioral definition of these latch types.) Hasteer et al. [4] have shown that 
multi-phase netlists may be “phase abstracted” to simpler FF-based netlists 
for sequential hardware equivalence. Baumgartner et al. [5] have taken a simi- 
lar approach (called “dual-phase abstraction” for a 2-phase design) for model 
checking. This approach preserves initial values and provides greater reduction 
in the number of state variables than the phase abstraction from [4] alone. Phase 
abstraction is fundamentally different than c-slow abstraction. For example, in 
multi-phase designs, only one class of latches updates at each time-step. Fur- 
thermore, the initial values of all but one class of latches will be overwritten 
before propagation. Therefore, given a FF-based design, these approaches are of 
no further benefit. 

The remainder of the document is organized as follows. Section 2 provides 
our definition of c-slow netlists, and introduces a 3-slow netlist N. In Section 3 
we introduce our abstraction algorithm, as well as the abstracted version of N. 
We demonstrate the correctness of the abstraction in Section 4. In particular 
we demonstrate a natural correspondence between the original and abstracted 
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designs which we refer to as a c-slow bisimulation, and discuss the effect of the 
abstraction upon CTL* model checking. In Section 5 we introduce algorithms to 
determine the maximum c and to translate traces from the abstracted model to 
traces in the original netlist. We provide experimental results in Section 6, and 
in Section 7 we summarize and present future work items. 



2 C-Slow Netlists 

In this section we provide our definition of c-slow netlists, which is a more general 
definition than that of [2] . We assume that the netlist contains no level-sensitive 
latches; if it does, phase abstraction [5] should be performed to yield a FF-based 
netlist. We further assume that the netlist has no gated-clock FFs; if it does, 
it is at most 1-slow (since an inactive gate mandates that the next state of the 
FF be equivalent to its present state). Each c-slow netlist N is comprised of c 
equivalence classes of FFs characterized as follows. 

Definition 1. A c-slow netlist is one whose gates and FFs may be c-colored 
such that: 

1. Each FF is assigned a color i. 

2. All FFs which have FFs of color i in their support have color {i 1) mod c. 

3. All gates which have FFs of color i in their support have color i. 

There are several noteworthy points in the above definition. First, since no 
gate may contain FFs of more than one color in its support, the design may 
not reason about itself except “modulo c” . Intuitively, it is this property which 
allows us to “fold” the design to a smaller domain of a single coloring of FFs. 

Second, the coloring restrictions apply only to FFs and to gates which have 
FFs in their support. Hence, it is legal for primary inputs to fanout to FFs of 
multiple colors, which makes our definition more general than that in [2]. This 
generality has shown great potential in extending the class of netlists to which 
we may apply the c-slow abstraction; the two design fragments of the Gigahertz 
Processor from our sample set which were found to be c-slow would not have 
been classifiable as such without this generality. 




Fig. 1. 3-Stage Netlist N 
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Consider the generic 3-slow netlist N depicted in Figure 1. We assume that 
this netlist represents a composition of the design under test and its environment. 
We also assume that no single input will fan out to FFs of different colors. Later 
in this section we demonstrate that we may soundly alter netlists which violate 
this assumption by splitting such inputs into functionally identical yet distinct 
cones, one per color. All nets may be vectors. We arbitrarily define the color of 
FFs X as 0, of Y as 1, and of Z as 2. 

Consider the first several time-steps of symbolic reachability analysis of N 
in Table 1 below. The symbolic values ai, bi, and Ci represent arbitrary values 
producible at inputs A, B, and C respectively. The symbol ij represents the 
initial value of the FFs of color j. 
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tos = /3(c4,/2(&3,t02)) 

^l3 = /l(«4,/3(C3,tl2)) 
t23 = /2(&4,/l(a3,t22)) 



Table 1. Reachability Analysis of N 



At any instant in time, each signal may only be a function of the value 
generated by a given “data source” at every c-th cycle. By data source, we 
refer to inputs and initial values. This observation illustrates the motivation 
behind this abstraction; our transformation yields a design where each signal 
may concurrently (nondeterministically) be a function of each data source at 
each cycle. 

3 Abstraction of C-Slow Netlists 

In this section we illustrate the structural modifications necessary and sufficient 
to perform the c-slow abstraction. 

The algorithm for performing the abstraction is as follows. Color c— 1 FFs are 
abstracted by replacement with a mux selected by a conjunction of a random 
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value with first-cycle] the output of this mux passes through a FF, and the 
initial value of this FF is defined by the value which would appear at its input if 
first-cycle were asserted. Color 0 FFs are abstracted by replacement with a mux 
which is selected by first-cycle. All others are abstracted by replacement with 
a mux which is selected by a conjunction of a random value with first-cycle. If 
the netlist is a feed-forward pipeline, all variables may be replaced in this last 
manner, thereby converting the sequential netlist to a combinational one. 

Consider netlist N' shown in Figure 2 below, which represents our c-slow ab- 
straction of N. We initialize Z' with the value which would appear at its inputs 
if first-cycle were 1. The new inputs ND and ’ represent unique nondetermi- 
nistic values. We will utilize one copy of this netlist (with first-cycle tied up) to 
generate the initial values for Z' . This initial value will be utilized in a second 
copy of this netlist where first-cycle is tied down, which is the copy that we will 
model check. Intuitively, the nondeterministic initial value allows the output of 
the rightmost mux to take all possible values observable at net I in N during 
the first c cycles. Thereafter, straightforward reachability analysis will ensure 
correspondence of N and N' . 




The first two cycles of symbolic simulation of N' is provided in Table 2. To 
compensate for the “nondeterministic initial state set” induced by this abstrac- 
tion, we have split this analysis into three timeframes; cycles Oq and Iq represent 
the timeframe induced by the first element of the set - that element further 
being a function only of the initial values of the FFs of color 0, cycles 0i and 
li induced by the second element, etc. This analysis illustrates the manner in 
which the design will be treated by the reachability engine. 

There are several critical points illustrated by this table. First, the values 
present at FFs of color c — 1 in iV', at time i along the path induced by initial 
values of color j, are equivalent to those present upon their correspondents in N 
at time 3i + j. This may be inductively expanded by noting that ij in the above 
two tables need not correlate to initial values, but to any reachable state. 
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toi = /2(66,/l(a5,to3)) 

Table 2. Reachability Analysis of N 



Another point is that, in order to “align” the values present at FFs of color 
c— 1, we have been forced to temporally skew the inputs. For example, comparing 
any state at time i in N' with state 3t + j in N , we see that the inputs which 
fan out to all but color 0 FFs are skewed forward by an amount equal to the 
color of the FFs in their transitive fanout. While this may seem bizarre, recall 
our assumption that the netlist is a composition of the environment and design. 
Therefore, without loss of generality, the inputs may only be combinational free 
variables or constants. In either case, the values they may hold at any times i 
and j are equivalent. For each value that input vector A may take at time i 
(denoted Ai), there is an equivalent value that A may take at any time j, and 
vice-versa. This fact is crucial to the correctness of our abstraction. 

To further demonstrate this point, we now discuss how we handle inputs 
which fan out to FFs of multiple colors. We split such “multi-color” input logic 
cones into a separate cone per color. To illustrate the necessity of this technique, 
assume that inputs A and B were tied together in iV - a single input AB. 
Thus, values ai and bi are identical. Tying these together in N' would violate 
the correspondence; for example, state is a function of 03 and 64, each being 
a distinct value producible by input AB. Clearly any state producible in N', 
where 03 and 64 are equivalent, would be producible in N. However, any state in 
N where 03 and 64 differ would be unproducible in N' , unless we split AB into 
two functionally equivalent, yet distinct, inputs - one for color 0 and the other 
for color 1. Such splitting of shared input cones is straightforward, and may be 
performed automatically in 0(c- nets) time. This logic splitting may encompass 
combinational logical elements, and even logic subcircuits including FFs which 
themselves are c-slow. The exact theory of such splitting of sequential cones is 
still under development, though may be visualized by noting that peripheral 
latches upon multi-color inputs clearly need not limit c. 
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4 Correctness of Abstraction 

In this section we define our notion of correspondence between the original and 
abstracted netlists, which we term a c-slow bisimulation (inspired by Milner’s 
bisimulation relations [6]). We will relate our designs to Kripke structures, which 
are defined as follows. 

Definition 2. A Kripke structure 1C = {S, S'oj A, £, R), where S' is a set of states, 
So C S is the set of initial states, A is the set of atomic propositions, £ : S i— >■ 2-^ 
is the labeling function, and R C S x S is the transition relation. 

Our designs are described as Moore machines (using Moore machines, instead 
of the more general Mealy machines [7], simplifies the exposition for this paper, 
though our implementation is able to handle Mealy machines by treating outputs 
as FFs). We use the following definitions for a Moore machine and its associated 
structure (similar to Grumberg and Long [8]). 

Definition 3. A Moore machine M = {L, S, So, I, O, V, S, 7), where L is the set 
of state variables (FFs), S = 2^ is the set of states, Sq C S is the set of initial 
states, I is the set of input variables, O is the set of output variables, V C L is 
the set of property visible FFs, S Q S x 2^ x S is the transition relation, and 
7 : S' I— >■ 2*^ is the output function. 

We uniquely identify each state by a subset of the state variables; intuitively, 
this subset represents those FFs which evaluate to a 1 at that state. We will 
define V to be the FFs of color c — 1. 

Definition 4. The Kripke structure associated with a Moore machine M. = 
{L,S,So,I,0,V,S,j) is denoted by K{M) = {S^ ,8^ ,A,C,R), where = 
= {s & ■. s- I & So}, A=V, C = 2^, and i?((s, x), {t, y)) 

iff (5(s, X, t). 

Intuitively, we define S(^ as the subset of S^ which, when projected down 
to the FFs, is equivalent to Sq. In the sequel we will use M to denote the Moore 
machine as well as the Kripke structure for the machine. 

Definition 5. A c-transition of a Kripke structure M is a sequence of c transi- 
tions from state Si to s^+c, where R{sj, Sj+i) for all j such that 0 < j < c. 



Definition 6. Let M be the Kripke structure of a c-slow machine. We define 
the extended initial state set of M, Sinit Q S, as the set of all states reachable 
within c — 1 time-steps from the initial state of M. 



Definition 7. Let M and M' be two Kripke structures. A relation G C S x S' 
is a c-slow-bisimulation relation if G{s, s') implies: 

1. C{s) = C'{s'). 




12 



J. Baumgartner et al. 



2. for every t € S such that there exists a c — transition of M from state s to 
t, there exists t' G S' such that R'{s',t') and G{t,t'). 

3. for every t' G S' such that R' {s', t'), there exists t & S such that there exists 
a c — transition of M from state s to t, and G{t, t'). 

We say that a c-slow-bisimulation exists from M to M' (denoted by M ^ M') 
iff there exists a c-slow bisimulation relation G such that for all s G Sinit and 
t' G S'q, there exist s' G S'q and t G Sinit such that G{s,s') and G{t,t'). 

An infinite path tt = (sq) sij S 2 , . . .) is a sequence of states such that any two 
successive states are related by the transition relation (i.e., Si+i)). Let tt* 

denote the suffix path (s^, Si+i, Si+ 2 , • ■ •)• We say that a c-slow-bisimulation rela- 
tion exists between two infinite paths tt = (sq, si, S 2 , • ■ •) and tt' = (sg, S 2 , • . .)> 
denoted by G(7 t, tt'), iff for all i > 0, we have that G{sc-i, s'). 

Lemma 1. Let s and s' he states of structures M and M' , respectively, such 
that G{s,s'). For each infinite path tt starting at s and composed of transitions 
of M , there exists an infinite path tt' starting at s' and composed of transitions 
of M' such that G{tt,tt'). Similarly, for each infinite path tt' starting at s' and 
composed of transitions of M' , there exists an infinite path tt starting at s and 
composed of transitions of M such that G{tt, tt') . 

The bisimilarity ensures that the reachable set of the original and abstrac- 
ted netlists will be equivalent. Thus, properties such as AG(f> and EFcf, where 
4> IS & boolean property, are trivially preserved using this abstraction, provided 
that all formula signals refer to nets of the same color. This may seem limit- 
ing: a simple formula such as AG{latchIn — >■ AX {latchOut)) reasons about two 
differently colored nets. However, such a formula may be captured by an auto- 
maton which transitions upon latchin = 1 to a state where it samples latchOuf 
the new “single-colored” formula asserts that this sampled value = 1. Note that 
such transformations of CTL to automata are commonplace for on-the-fly model 
checking [9]. 

One approach to transforming properties for this abstraction is to synthe- 
size the original property (for the unabstracted design), and compose it into the 
model prior to abstraction. The structural abstraction will thereby also abstract 
the property. Logic synthesis algorithms may need to be tuned for optimal use 
of the c-slow abstraction. For example, p — >■ AXAXAXq may likely be syn- 
thesized as a counter (counting the number of cycles which have occurred since 
p). However, such a direct translation would violate the c-slow topology of the 
design. We therefore propose a pipelined “one-hot” translation which would, 
for example, specifically introduce N state variables rather than \log 2 {N)~\ for 
AX^ , if it is determined that the design has a c-slow topology. Translation of 
design substructures which violate c-slowness, but may be safely replaced with 
substructures which do not, is an important topic which is not limited to pro- 
perty automata. We did not implement such a “c-slow-friendly translator” for 
our experimentation, as our properties were relatively simple and in many cases 
implemented directly via automata anyway, though we feel that one could be 
readily automated. 
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A second approach to abstracting properties is by a dedicated transformation 
algorithm. Both approaches place substantial constraints upon the fragment of 
CTL* that our abstraction may handle, as is reflected by the following definition. 

Definition 8. A c-slow reducible (CSR) subformula </> and its c-slow reduction 
are defined inductively as follows. 

— every atomic proposition p is a CSR state formula (f> = p, and = p. 

(Note that these atomic propositions are limited to the property-visible nets 
V, which are nets of color c — 1.) 

~ if p is a CSR state formula, so is </> = ~<p, and C((/>) = -iC(p). 

— if p and q are CSR state formulae, so is </> = p A g, and C(^) = C(p) A 0{q). 

— if p is a CSR path formula, then (/> = Ep is a CSR state formula, and = 
EC(p). 

— if p is a CSR path formula, then <j> = Ap is a CSR state formula, and 

= AC(p). 

— each CSR state formula </> is also a CSR path formula (f>. 

— if p is a CSR path formula, so is </> = -•p, and C((/>) = -if7(p). 

— if p and q are CSR path formulae, so is (p = p A q, and = C(p) A 0{q). 

— if p is a CSR path formula, so is (() = X'^p, and f2{(j>) = Xl7(p). (Note that 
strings of less than c X operators must be flattened via translation to auto- 
mata.) 

If p is a CSR subformula, then (p = AG p and (p = EF p are CSR state for- 
mulae, for which f2{(p) = AG C(p) or G{(p) = EF C(p), respectively. These last 
transformations are not recursively applicable; EF and AG are only applicable 
as the first tokens in the formula. Furthermore, CSR subformulae are themselves 
insufficient for model checking using this abstraction; the top-level quantification 
is necessary to break the dependence on the initial state of the concrete model. 

Note that pU q is not a CSR formula, since it entails reasoning across conse- 
cutive time-steps. However, since the design may only reason about itself modulo 
c, we have not found it beneficial to use such a property to verify the design. 
Furthermore, we have found it useful to use a modulo-c variant of the U operator 
for such designs. We define pUc q as true along a path (s^, Sj+i, ...) iff there exists 
a j such that Sj ^ q, and for all states at index k where (j mod c) = (fc mod c) 
and k < j, we have that Sj |= p. Similar approaches apply to the general use of 
G and F. 

It is noteworthy that every property which we had verified of the designs 
reported in our experimental results was suitable for c-slow abstraction. As per 
the above discussion, we suspect that all meaningful properties of such netlists 
will either be directly suitable for c-slow abstraction, or may be strengthened to 
be made suitable. 

Theorem 1. Let s and s' be states of M and M' , and tt = (sq, si, S 2 , . . .) and 
7t' = (sg, s(, S 2 , . . .) he infinite paths of M and M' , respectively. If G is a c-slow- 
bisimulation relation such that G{s,s') and G(7r,7r'), then 
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1. for every c- slow-reducible CTL* state formula (j>, s \= (p iff s' \= f2{4>). 

2. for every c- slow-reducible CTL* path formula 4> , tt \= p iff tt' \= 

Proof. The proof is by induction on the length of the formula (p. Our induction 
hypothesis is that Theorem 1 holds for all CTL* formulae cp' of length < n. 

Base Case: n = 0. There are no CTL* formulae of length 0, hence this base 
case trivially satisfies the induction hypothesis. 

Inductive Step: Let (p be an arbitrary CTL* formula of length n + 1. We 
will utilize the induction hypothesis for formulae of length < n to prove that 
Theorem 1 holds for (p. There are seven cases to consider. 

1. If (/) is a state formula and an atomic proposition, then C((p) = p. Since s 
and s' share the same labels, we have that s\=p^ s' \=p^ s' \= C{p). 

2. If (/) is a state formula and p = -'pi, then C{p) = -•f2{pi). Using the 

induction hypothesis (since the length of pi is exactly one less than that 
of p), we have that s ^ s' |= C{pi). Consequently, we have that 

s|=(()4=>s^^l44>s'^ f2[pi) 4=> s' 1= 

3. If ^ is a state formula and p = pi A p 2 , then L2{p) = L2{pi) A fi{p 2 ). 

By definition, we know that s ^ ((s |= pi) and (s |= p 2 ))- Using 

the induction hypothesis, since the lengths of pi and p 2 are strictly less 
than the length of p, we have that (s \= pi) s' ^ ^{Pi), and also 
that (s 1= P 2 ) <t4> s' ^ i2{p2). Therefore, we have that s ^ (s' ^ 

and s' \= f 2 {p 2 )) s' |= C{p). 

4. If ^ = Epi, then f2{p) = Ef2{pi). The relation s |= (/) is true iff there exists 
an infinite path a beginning at state s such that a \= pi. Consider any cr 
beginning at s (regardless of whether a ^ pi), and let a' be an infinite path 
beginning at state s' such that G(cr, ct'). Such a a' must exist by Lemma 1. 
Using the induction hypothesis, since the length of pi is exactly one less 
than the length of p, we have that a \= pi ^ a' \= f2{pi). This implies that 
s \= p s' 1= f2(^p). 

5. If p = Api, then L2{p) = An{pi). The expression s ^ is true iff for every 
infinite path a beginning at state s, we have that a \= pi. For every infinite 
path a beginning at s, consider the infinite path a' beginning at state s' such 
that G{a,u'). Such a path must exist by Lemma 1. Applying the induction 
hypothesis, since the length of pi is exactly one less than the length of p, 
we have that a \= pi ^ a' \= C{pi). This implies that s ^ s' ^ G(0). 

6. Suppose ^ is a path formula which is also a state formula. Since we have 
exhausted the possibilities for state formulae in the other cases, we conclude 
that TT \= p tt' \= C{p). 

7. li p = "^Cpi, then f2{p) = Xf2{pi). Note that expression ir \= p is true 
iff = (sc, Sc+i, Sc+2, ■ • ■) 1= 4’i- Since G(7 t'^, and using the induction 
hypothesis (since the length of pi is less than the length of p), we have that 
tt" \= pi ^ 7 t'^ = (s(, S2, S3 , . . .) H ^{4>i)- This is equivalent to tt \= p ^ 
7 t' XQ{pi), and also to tt ^ ^ tt' ^ G{p). 

Note that one by-product of this abstraction is that it extends the initial state 
set of M' to encompass all states reachable within c— 1 time-steps from the initial 
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state of M. The above cases prove preservation of CTL* model checking along 
each path, and with respect to each state. The necessity of prefixing all c-slow 
reducible CTL* formulae with EF or AG prevents this extended initial set from 
becoming visible to properties. 



Theorem 2. If N' is a c-slow abstraction of N, then N -< N' . 

Proof. Our induction hypothesis is that Theorem 2 holds for all c-slow netlists, 
where c < n. If c < 2, no c-slow abstraction will be performed. 

Base Case: n = 2. 

As depicted in Figure 3, in this case N has two sets of FFs, where X is 
color 0, and Z is color 1. The abstraction (used to generate N') is performed as 
described in Section 3. 




Fig. 3. Original and Abstracted Netlists, N and N’, Base Case (C=2) 



This proof may be readily completed by enumerating an inductive symbolic 
simulation for the two netlists (as in Tables 1 and 2), and demonstrating their 
bisimilarity. This analysis is omitted here due to space constraints. 

Inductive Step: 

This step shows that if the bisimulation holds for c up to n, then it also 
holds for c = n+1. The induction relies upon the the addition of “intermediate” 
colored sets of FFs, which are all replaced with MUXes whose selects are con- 
junctions of a unique random value (for that color) with first-cycle, depicted in 
Figures 4 and 5 as the “inductive units”. A set of zero or more inductive units 
is depicted as a cloud within these two figures, and used in the obvious manner 
to bring c (along with the explicitly depicted inductive unit) up to n -F 1. 

This proof also may be completed by an inductive symbolic simulation. 

The proof of correctness for feed-forward pipelines is omitted due to space 
constraints, but follows immediately from the example in the inductive step, by 
omitting the color 0 and c — 1 FFs (and their feedback path). 

5 Algorithms 

Our algorithm for determining the maximum c is similar to that presented in 
[2]. We iterate over each FF in the netlist; if unlabeled, it is labeled with an 
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Fig. 4. Unabstracted Netlist N, Inductive Case 




NDc-2 



NDc-1 




Fig. 5. Abstracted Netlist N’, Inductive Case 



arbitrary index 0. We next perform a depth-first fanout search from this FF; 
each time we encounter a FF, we label it with the current index and increment 
that index for the recursive fanout search from that FF. Once the fanout search 
is finished, we perform a depth-first fanin search from the original FF ; each time 
we encounter an unlabeled FF, we label it with the current index and decrement 
that index for the recursive fanin search from that FF. When an already-labeled 
FF is encountered during a fanin or fanout search, we find the greatest-common 
divisor of the previous “maximum c” and the difference between the previous 
index of this FF and the current index. Once completed, this algorithm (which 
runs in linear time) yields the maximum value of c. If c is equal to 1, no c- 
slow abstraction may be performed. If c is never updated during coloring, the 
netlist is a feed- forward pipeline. To obtain the FF colorings, we transform the 
initial indices modulo c such that the property- visible FFs have color c — 1. For 
feed-forward pipelines, we merely shift the initial indices such that the lowest- 
numbered ones are equivalent to 0. 

Abstraction of the netlist occurs as in Section 3. Rather than introducing 
c — 1 nondeterministic variables (used only for the calculation of initial values), 
we need only [/ 052 (c)] variables. This is due to the observation that, if the 
nondeterministic value conjuncted for the selector of a color i mux is equivalent 
to 1 , then the nondeterministic values conjuncted with the selectors of all color 
j muxes (where j < i) are don’t cares. We may utilize this /052 nondeterministic 
value to determine the color of the FFs whose initial values will propagate to 
the remaining FFs. Thus, an abstract initial state may be bound to a unique 
timeframe of the concrete design. Note that, regardless of the implementation, 
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the same random value must be utilized for all color i muxes to ensure atomicity 
of the data. 

Other than the \log 2 {c)~\ nondeterministic values introduced for initial value 
generation, the only other variables introduced are due to splitting of input 
cones. While this may increase the number of inputs of the design by a factor of 
c, we have found in practice that this increase is quite small - a small fraction 
of the total number of inputs - and negligible compared to the number of state 
variables removed. Despite this increase, these split inputs should not cause a 
serious BDD blowup since they tend to form independent (on a per-color basis) 
input ports, which a sophisticated BDD ordering may exploit. 

5.1 Trace Lifting 

We will perform our model checking upon the abstracted netlist N' . However, 
we must translate the traces obtained to the original netlist N. We have found 
that the simplest way to perform the translation is to generate a testcase by 
projecting the trace from N' down to the inputs, and manipulating this testcase 
for application to N. We perform this generation in two steps: a prefix generation, 
and a suffix generation. 

For suffix generation, if no input splitting occurs, we may merely stutter each 
input c — 1 times to convert the testcase. If input splitting does occur, we apply 
the stuttering technique to all non-split inputs. We define the color of an input 
as the color of the FFs to which it fans out. For example, if input A drives FFs 
of color 0 and 2, it will be split into two inputs - A_0 and A_2, of color 0 and 
2, respectively. We use the value upon input A_i at cycle j in the abstract trace 
for the value at time c ■ j + i of input A in the testcase for N. We fill in any 
remaining gaps by an arbitrary selection of any legal value; this value does not 
influence the behavior of interest. The suffix generation accounts for the fact 
that every transition of the abstract machine correlates to a c-transition of the 
original machine. 

The prefix generation is used to prepend to the suffix trace a path suitable 
to transition N from an initial state to the first state in the suffix trace (which 
is an element of Smit)- Letting ndJnit be the value encoded in the log 2 initial 
value variables in the initial abstract state, the length of the prefix p is equal to 
c — 1 — ndJnit. Generating the prefix backwards (and beginning with i = 0), we 
iteratively prepend the input values (from the initial value copy of the netlist) 
which fan out to color c — 1 — i FFs, then increment i and repeat until i = p. 
For feed-forward pipelines, all trace lifting is performed via prefix generation. 

6 Experimental Results 

We utilized IBM’s model checker, RuleBase [10], to obtain our experimental 
results. We arbitrarily selected ten components of IBM’s Gigahertz Processor 
which had previously been model checked. Our algorithm identified two of these 
as being c-slow. The first is a feed-forward pipeline; the second is the five-slow 
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pipeline with feedback mentioned in Section 1. Both were explicitly entered in 
HDL as c-slow designs - this topology is not the by-product of a synthesis tool. 
Both had multi-colored inputs, which our generalized topology was able to ex- 
ploit. Both of these components had been undergoing verification and regression 
for more than 12 months prior to the development of this abstraction technique. 
Consequently, the unabstracted variants had very good BDD orderings available. 
All results were obtained on an IBM RS/6000 Workstation Model 595 with 2 
GB main memory. RuleBase was run with the most aggressive automated model 
reduction techniques it has to offer (including dual-phase abstraction [5]), and 
with dynamic BDD reordering (Rudell) enabled. 

Prior to running the c-slow abstraction algorithm on these netlists, we ran 
automated scripts which removed scan chain connections between the latches 
(which unnecessarily limited c), and which cut self- feedback on “operational 
mode” FFs (into which values are scanned prior to functional use of the netlist, 
and held during functional use via the self- feedback loop). 

We first deployed this abstraction technique on the feed-forward pipeline. The 
most interesting case was the most complex property against which we verified 
this design. The unabstracted version had 148 variables, and with our best initial 
ordering took 409.6 seconds with a maximum of 1410244 allocated BDD nodes. 
The first run on the abstracted variant (with a random initial ordering) had 53 
variables, and took 44.9 seconds with a maximum of 201224 BDD nodes. While 
this speedup is significant, this comparison is skewed since the unabstracted run 
benefited from the extensive prior BDD reordering. Re-running the unabstracted 
experiment with a random initial ordering took 3657.9 seconds, with 2113255 
BDD nodes. Re-running the abstracted experiment using the ordering obtained 
during the first run as the initial ordering took 4.6 seconds with 98396 nodes. 
Computing the c-slow abstraction took 0.3 seconds. 

The next example is the five-slow design. With a good initial ordering, model 
checking the unabstracted design against one arbitrarily selected formula took 
5526.4 seconds, with 251 variables and 3662500 nodes. The first run of the ab- 
stracted design (with a random initial ordering) took 381.5 seconds, with 134 
variables and 339424 nodes. Re-running the rule twice more (and re-utilizing 
the calculated BDD orders) yielded a run of 181.1 seconds, 293545 nodes. Model 
checking the unabstracted design with an random initial ordering took 23692.5 
seconds, 7461703 nodes. Computing the c-slow abstraction took 3.2 seconds. 

Note that, due to the potential increase in depth of combinational cones 
entailed by this abstraction, there is a risk of a serious blowup of the transi- 
tion relation or function. Splitting or conjoining may be utilized to combat such 
blowup [11]. A reasonable ordering seems fairly important when utilizing this 
abstraction. One set of experiments were run with reordering off, and a ran- 
dom initial ordering. The results for the feed-forward pipeline were akin to those 
reported above. However, the five-slow abstracted transition relation was signifi- 
cantly larger than the unabstracted variant given the random ordering, thereby 
resulting in a much slower execution than on the unabstracted run. With reor- 
dering enabled, the results (as reported above) were consistently superior. 
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7 Conclusions and Future Work 

We have developed an efficient algorithm for identifying and abstracting ge- 
neralized flip-flop based c-slow netlists. Our approach generalizes the definition 
provided in [2] ; this generality allows us to apply our abstraction to a substantial 
percentage of design components. This abstraction is fully automated, and runs 
in 0{c- nets) time. Our abstraction decreases the number of state variables, and 
the diameter of the model by c. Our experimental results indicate the substantial 
benefit of this abstraction in reducing verification time and memory (one to two 
magnitudes of order improvement), when applicable. We discuss expressibility 
constraints on CTL* model checking using this abstraction. 

Future work items involve efficient techniques for identification of netlist 
substructures which violate c-slowness, yet may be safely replaced by others 
which do not. Other work involves extending the class of netlists to which this 
technique may be applied through the splitting of sequential cones. 
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Abstract. This paper presents a scalable method for parallel symbolic 
reachability analysis on a distributed-memory environment of workstati- 
ons. Our method makes use of an adaptive partitioning algorithm which 
achieves high reduction of space requirements. The memory balance is 
maintained by dynamically repartitioning the state space throughout 
the computation. A compact BDD representation allows coordination 
by shipping BDDs from one machine to another, where different variable 
orders are allowed. The algorithm uses a distributed termination proto- 
col with none of the memory modules preserving a complete image of the 
set of reachable states. No external storage is used on the disk; rather, 
we make use of the network which is much faster. 

We implemented our method on a standard, loosely-connected environ- 
ment of workstations, using a high-performance model checker. Our in- 
itial performance evaluation using several large circuits shows that our 
method can handle models that are too large to ht in the memory of 
a single node. The efficiency of the partitioning algorithm is linear in 
the number of workstations employed, with a 40-60% efficiency. A cor- 
responding decrease of space requirements is measured throughout the 
reachability analysis. Our results show that the relatively-slow network 
does not become a bottleneck, and that computation time is kept rea- 
sonably small. 



1 Introduction 

This paper presents a scalable parallel algorithm for reachability analysis that 
can handle very large circuits. 

Reachability analysis is known to be a key component, and a dominant one, 
in model checking. In fact, for large classes of properties, model checking is 
reducible to reachability analysis [3] ; most safety properties can be converted into 
state invariant properties (ones that do not contain any temporal operators) by 
adding a small state machine (satellite) that keeps track of the temporal changes 
from the original property. The model checking can be performed ”on-the-fly” 
during reachability analysis. Thus, for safety properties verification is possible if 
reachability analysis is, and an efficient method for model checking these kind 
of properties is one where the memory bottleneck is the reachable state space. 

There is constant work on finding algorithmic and data structure methods to 
reduce the memory requirement bottlenecks that arise in reachability analysis. 
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One of the main approaches to reducing memory requirements is symbolic model 
checking. This approach uses Binary Decision Diagrams (BDDs) [4] to represent 
the verified model. However, circuits of a few hundreds of state variables may 
require many millions of BDD nodes, which often exceeds the computer memory 
capacity. 

There are several proposed solutions to deal with the large memory require- 
ments by using parallel computation. Several papers suggest to replace the BDD 
with parallelized data structure [11,1]. Stern and Dill [10] show how to paralle- 
lize an explicit model checker that does not use symbolic methods. Other papers 
suggest to reduce the space requirements by partitioning the work to several 
tasks [5,9,8]. However, these methods do not perform any parallel computation. 
Rather, they use a single computer to sequentially handle one task at a time, 
while the other tasks are kept in an external memory. Our work is similar, but 
we have devised a method to parallelize the computation of the different tasks. 
Section 7 includes a more detailed comparison with [5] and [8] . 

Our method parallelizes symbolic reachability analysis on a network of pro- 
cesses with disjoint memory, that communicate via message passing. The state 
space on which the reachability analysis is performed, is partitioned into slices, 
where each slice is owned by one process. The processes perform a standard 
Breadth First Search (BFS) algorithm on their owned slices. However, the BFS 
algorithm used by a process can discover states that do not belong to the slice 
that it owns (called non-owned states). When non-owned states are discovered, 
they are sent to the process that owns them. As a result, a process only re- 
quires memory for storing the reachable states it owns, and computing the set 
of immediate successors for them. As can be seen by the experimental results 
in Section 6, communication is not the bottleneck. We can thus conclude that 
usually, the number of non-owned states found by a process is small. 

Computation on a single slice usually requires less memory than computation 
on the whole set. Thus, this method enables the reachability analysis of bigger 
models than those possible by regular non-parallel reachability analysis. Furt- 
hermore, applying computation in parallel reduces execution time (in practice, 
doing computation sequentially makes partitioning useless because of the large 
execution time). 

Effective slicing should significantly increase the size of the overall state space 
that can be handled. This is not trivial since low memory requirements of BDDs 
are based on sharing among their parts. Our slicing procedure is therefore desi- 
gned to avoid as much redundancy as possible in the partitioned slices. This is 
achieved by using adaptive cost function that for each partitioning chooses slices 
with small redundancy, while making sure that the partition is not trivial. Ex- 
perimental results show that our slicing procedure results in significantly better 
slicing than those obtained by fixed cost functions (e.g. [5,8]). 

Memory balance is another important factor in making parallel computation 
effective. Balance obtained by the initial slicing may be destroyed as more re- 
achable states are found by each process. Therefore, balancing is dynamically 
applied during the computation. Our memory balance procedure maintains ap- 
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proximately equal memory requirement between the processes, during the entire 
computation. 

Our method requires passing BDDs between processes, both for sending non- 
owned states to their owners and for balancing. For that, we developed a compact 
and efficient BDD representation as a buffer of bytes. This representation enables 
different variable orders in the sending and receiving processes. 

We implemented our technique on a loosely-connected distributed environ- 
ment of workstations, embedded it in a powerful model checker RuleBase [2], 
and tested it by performing reachability analysis on a set of large benchmark 
circuits. Compared to execution on a single machine with 512MB memory, the 
parallel execution on 32 machines with 512MB memory uses much less space, 
and reaches farther when the analysis eventually overflows. Our slicing algorithm 
achieves linear memory reduction factor with 40-60% efficiency, which is main- 
tained throughout the analysis by the memory balancing protocol. The timing 
breakdown shows that the communication is not a bottleneck of our approach, 
even using a relatively slow network. 

The rest of the paper is organized as follows: Section 2 describes the main 
algorithm. Section 3 discusses the slicing procedure and Section 4 suggests several 
possible optimizations. The way to communicate BDD functions is described in 
Section 5. Experimental results prove the efficiency of our method in Section 6. 
Finally, Section 7 concludes with comparison with related works. 



2 Parallel Reachability Analysis 

Computing the set of reachable states is usually done by applying a Breadth 
First Search (BFS) starting from the set of initial states. In general, two sets of 
nodes have to be maintained during the reachability analysis: 

1. The set of nodes already reached, called reachable. This is the set of reach- 
able states, discovered so far, which becomes the set of reachable state when 
the exploration ends. 

2. The set of reached but not yet developed nodes, called new. 

The right-hand-side of Figure 1 gives the pseudo-code of the BFS algorithm. 

The parallel algorithm is composed of an initial sequential stage, and a par- 
allel stage. In the sequential stage, the reachable states are computed on a single 
node as long as memory requirements are below a certain threshold. When the 
threshold is reached, the algorithm described in Section 3 slices the state space 
into k slices. Then it initiates k processes. Each process is informed of the slice 
it owns, and of the slices owned by each of the other processes. The process 
receives its own slice and proceeds to compute the reachable states for that slice 
in iterative BFS steps. 

During a single step each process computes the set next of states that are 
directly reached from the states in its new set. The next set contains owned 
as well as non-owned states. Each process splits its next set according to the 
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1 mySlice = receive(fromSingle); 

2 reachable = receive(fromSingle); 

3 new = receive (fromSingle); 

4 while (Termination(new)==0) { 

5 next = nextStatelmage(new); 

6 next = sendRecieveAll(next) 

7 next = next n mySlice 

8 new = next \ reachable; 

9 reachable = reachable U next; 

} 

(a) BPS by one process 



reachable = new = initialStates; 
while (new ^ 4 >) { 
next = nextStatelmage(new); 
new = next \ reachable; 
reachable = reachable U next; 

} 



(b) Sequential BPS 



Fig. 1. Breadth Pirst Search 

k slices and sends the non-owned states to their corresponding owners. At the 
same time, it receives states it owns from other processes. 

The reachability analysis procedure for one process is presented on the left- 
hand-side of Figure 1. Lines 1-3 describe the setup stage: the process receives the 
slice it owns, and the initial sets of states it needs to compute from. The rest of 
the procedure is a repeated iterative computation until distributed termination 
detection is reached. Notice that, the main difference between the two procedures 
in Figure 1 is the modification of the set next in lines 6-7 as the result of 
communication with the other processes. 

The parallel stage requires an extra process called the coordinator. This pro- 
cess coordinates the communication between the processes, including exchange 
of states, dynamic memory balance, and distributed termination detection. Ho- 
wever, the information does not go through the coordinator and is exchanged 
directly between the processes. 

In order to exchange non-owned states, each process sends to the coordinator 
the list of processes it needs to communicate with. The coordinator matches pairs 
of processes and instructs them to communicate. The pairs exchange states in 
parallel and then wait for the coordinator, that may match them with other 
processes. Matching continues until all communication requests are fulfilled. A 
process which ends its interaction may continue to the next step without waiting 
for the rest of the processes to complete their interaction. 

2.1 Balancing the Memory Requirement 

One of the objectives of slicing is to distribute an equal memory requirement 
amongst the nodes. Initial slicing of the state space is based on the known reach- 
able set at the beginning of the parallel stage. This slicing may become inade- 
quate as more states are discovered during reachability analysis. Therefore the 
memory requirements of the processes are monitored at each step, and whene- 
ver they become unbalanced, a balance procedure is executed. The coordinator 
matches processes that have a large memory requirement with processes that 
have a small one. Each pair re-slices the union of their two slices, resulting in a 
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better balanced slicing. The pair uses the same procedure that is used to slice 
the whole state space (described in Section 3) with k = 2. After the balance 
procedure is completed, the pair informs the new slicing to the other processes. 



2.2 Termination Detection 

In the sequential algorithm, termination is detected when there are no more 
undeveloped states i.e., new is empty. In the parallel algorithm, each process can 
only detect when new is empty in its slice. However, a process may eventually 
receive new states even if at some step its new set is temporarily empty. 

The parallel termination detection procedure starts after the processes ex- 
change all non-owned states. Each process reports to the coordinator whether its 
new set is non-empty. If all the processes report an empty new set, the coordinator 
concludes that termination has been reached and reports this to all the processes. 

3 Boolean Function Slicing 

Symbolic computation represents all the state sets, and the transition relation as 
Boolean functions. This representation becomes large when the sets are big. To 
reduce memory requirements we can partition a set into smaller subsets whose 
union is the whole set. This partition, or slicing should have smaller memory 
requirements. Furthermore, the subsets should be disjoint in order to avoid du- 
plication of work when doing reachability analysis. Since sets are represented as 
Boolean functions, slicing is defined for those functions. 

Definition 1. [Boolean function slicing] [9] Given a Boolean function f : — >■ 

B, and an integer k, a Boolean function slicing x(/j k) of f is a set ofk function 
pairs, x(/, k) = {(S'!, /i), . . . , {Sk, fk)} that satisfy the following conditions: 

1. Si and fi are Boolean functions, for 1 < i < k. 

2. S'! V S'2 V . . . V S'fc = 1 

3. Si A Sj = 0, for i ^ j 

4- fi = SiAf,forl<i< k. 

The Si functions define the slices of the state space and we refer to them as 
slices. 

Reducing memory requirements depends on the choice of the slices Si, , Sk. 
Specifically, when representing functions as BDDs, the memory requirement of 
a function /, denoted |/|, is defined as the number of BDD nodes of /. Thus 
slicing a function / into two functions /i and /2 may not necessarily reduce the 
requirements. BDDs are compressed decision trees where pointers are joined if 
they refer to the same subtree (see Figure 2 for a BDD example). This causes 
significant sharing of nodes in the Boolean function (for example node 4 in 
Figure 2). As a result of this sharing, a poor choice of Si,S 2 may result in 
|/i| « I/I, and also I/ 2 I « |/|. 
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Finding a good set of slices is a difficult problem. A possible heuristic ap- 
proach to solving this problem is to find a slicing that minimizes \fi\, . . . ,\fk\. 
However as the results of our experiments in Section 6 show, a better approach is 
to find a slicing that additionally minimizes the sharing of HDD nodes amongst 
the k functions fi, ■ ■ ■ , fk- 



3.1 Slicing a Function in Two: SelectVar 

Our slicing algorithm, SelectVar, slices a Boolean function (a BDD) into two, 
using assignment of a BDD variable. The algorithm receives a BDD /, and a 
threshold S. It selects one of the BDD variables v and slices / into fv = f 
and fv = f Av. Figure 2 shows an example of such a slicing where the function 
/ is sliced using variable Vi into /i and 
The cost of such a slicing is defined as: 

Definition 2. [Cost(f, v, a ):] a * -I- (1 — a) * 

The factor gives an approximate measure to the reduction achieved 

by the partition. The factor gives an approximate measure of the amount 

of sharing of BDD nodes between /„ and fy (e.g., node 4 in Figure 2), and 
therefore reflects the redundancy in the partition. 




Fig. 2. slicing / into /i and /2 



The cost function depends on a choice of 0 < a < 1. An a = 0 means that the 
cost function completely ignores the reduction factor, while a = 1 means that 
the cost function completely ignores the redundancy factor. Our algorithm uses 
a novel approach in which a is adaptive and its value changes in each application 
of the slicing algorithm, so that the following goals are achieved: ( 1 ) the size of 
each slice is below the given threshold 5, and (2) redundancy is kept as small as 
possible. 

Initially, the algorithm attempts to And a BDD variable which only mini- 
mizes the redundancy factor (o? = 0), while reducing the memory requirements 
below the threshold (i.e., max(|/i|, I/ 2 I) < |/| — (5). If such a slicing does not 
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exist the algorithm increases a (i.e., allows more redundancy) gradually until 
max(|/i|, I/ 2 I) <\f\-S is achieved. 

The threshold is used to guarantee that the partition is not trivial, i.e., it is 
not the case that |/i| « |/| and I/ 2 I « 0. If the largest slice is approximately 
\f \ — S and the redundancy is small, it is very likely that the other slice is 
approximately of size 6. 

The pseudo code for the algorithm SelectVar(/, (5), is given in Figure 3. We 
set STEP = min(0.1, and <5 = where k is the number of overall slices we 
want to achieve. 



a = Aa = STEP 

BestVar = the variable with minimal cost{f, v, a) 
while ((max(|/ A t|, |/ A t|) > |/| - S) A (a <= 1)) 
a = Q + Aa 

BestVar = the variable with minimal cost{ f, v, a) 
return BestVar 



Fig. 3. The pseudo code for the algorithm SelectVar(/, 5). 



Note that, even though our algorithm may compute the cost functions for 
many different a, |/ A ?;| and |/ A h| are computed only once for each variable 
V, therefore, computation time is not increased. Furthermore, the computation 
of 1/ A u| and |/ A u| for different variables, v, is done in parallel. Different 
computers compute the values for the different variables. The values are then 
sent to a computer that determines the variable with the minimal cost. 

Instead of gradually increasing a, it is possible to find the best a by binary 
search. For a model with a large number of BDD variables, and large k, this 
improvement is essential in making our method efficient. 

Our slicing procedure is different from those of [9,5] in that we use adaptive a, 
and put a lot of emphasis on obtaining small redundancy. Since cost functions 
are computed in parallel, we can allow computing them more precisely, thus 
achieving better fine-tuning of our slicing. The comparison to fixed a as suggested 
in [9,5] is given in Figure 4. 

3.2 Slicing a Function into k Slices 

Recall that SelectVar may result in two unbalanced slices that are approxima- 
tely of sizes \ f\ — S and <5. When we aim at a partition of k slices, we therefore 
use ^ and repeatedly slice the largest slice, until k slices are obtained. In 
this way we obtain a balanced partition. 

4 Optimizing the SelectVar Procedure 

SelectVar(/, (5) selects a state variable v and uses it to split a set / into two 
sets: fAv and / Aw. Recall that the algorithm attempts to satisfy two conditions; 
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Fig. 4. Partitioning results measured by two parameters: the redundancy (red) which 
is the ratio between the overall size of the slices and the original reachable size, and 
the memory reduction (mem), which is the ratio between the original reachable size 
and the largest slice in the partition. 
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that the size of both resulting sets is at most \ f\ — S, and that the redundancy 
is minimized. 

The efficiency of this algorithm and its success in meeting the conditions 
are crucial factors in the efficiency of the whole scheme, especially when the 
number of processes increases. In this section we observe that the algorithm can 
be improved in several ways. 

The main observation in improving the slicing procedure is that the split of 
/ which meets best the conditions might not be achieved using a single variable. 
Indeed, it was previously suggested that the algorithm can achieve better results 
by the choice of a general function g which determines two sets: f A g and f A g 
[9]. However, since there is an exponential number of candidates for g, trying 
them all will take too much time. In the rest of this section we develop heuristics 
which help to choose a “good” g while keeping a reasonable complexity for the 
choice. 

We construct g iteratively as follows. Suppose at a certain step we already 
have g' . We now choose a state variable v and compute the cost Cost{f, a,g) of 
all the functions of the form g = g' o\) v. We use the following options: g' A v, 
g' A V, g' g' Av,g'\/ V. Thus, the complexity of computing the cost of all 
the possibilities at a certain iteration, assuming that we try all state variables, 
can be as high as five times the number of states variables times the cost of a 
HDD operation. 

In what follows we describe the ways to use the above observation. These are 
a set of heuristics that we found to be effective (see Table 4) . There are several 
configurable parameters which appear in the description of the heuristics, and 
which we currently set in our implementation using a trial-and-error methods. 

Optimization 1. The general construction of a splitting function g, as de- 
scribed above is called by SelectVar(/, S) when a splitting variable BestVar 
which satisfies the conditions is found, we call ImprovingSplit which applies 
the construction with BestVar as a base function, in an attempt to further im- 
prove the cost. 

Optimization 2. We choose a small number I of the best variables found so 
far and send them as inputs to the general construction of the splitting function. 
This time, we use each variable only once: we start with the best one, add the 
second best, etc. If any of the I — I resulting functions meet the conditions - we 
are done. Else, if the functions are different than those found at the end of the 
previous iteration of SelectVar, they are added to the existing list of variables. 
This increases the input to the next iteration of SelectVar by I — 1 functions 
which have high potential of becoming good slicers. 

Optimization 3 (Only very small a.) We choose the best splitting varia- 
ble so far, and iteratively add more variables according to the general construc- 
tion of the slicing function. However, this time we first select those variables for 
which the resulting function strictly decreases the size of the slices. Only then, 
out of those variables selected, we choose the one for which the slicing function 
achieves a minimal cost. 
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5 Efficient Transfer of BDDs 

As described in Section 2, processes periodically exchange BDDs during reach- 
ability analysis. Two utility functions are used. bdd2msg translates a BDD into 
a more compact msg data and msg2bdd translates the msg data back to a BDD 
after it has been transferred. The purpose of bdd2msg is to serialize the BDD 
structure in order for it to be suitable for raw buffer transfer. 

BDD nodes represent a boolean function / recursively. The functions 0 and 
1 are represented by special BDDs called ZERO, and ONE respectively. Other 
functions are represented by a node that contains variable identification x, and 
two pointers, leftPtr and rightPtr, that point to two other BDD nodes that 
representing and f^, respectively. The function / is expressed based on the 
Shannon expansion: xfx + xfx- 

The msg data is a sequence of records. Each msg record has four fields: An 
index for that record (symbolic pointer), denoted as Sid. The variable id of the 
record denoted as Xid. An Sid for the record left son, and an Sid for its right 
son. The index field indicates the record location in the msg data. The records 
ZERO, and ONE have special index. 

bdd2msg traverses the nodes of BDD / in Depth First Search (DFS) order. 
It creates the corresponding msg records from the leaves upwards. Every time 
it creates a new msg record, it increments an index, which serves as the Sid for 
that record. msg2bdd traverses the msg records sequentially from start to end. 
It creates the corresponding BDD nodes one by one as it traverses the data. 
Shannon expansion is used to create the BDD node from the record. Such trans- 
formation is possible due to the fact that the Xids remain constant throughout 
the computation. 

Remark: The transferred BDD / is compressed by the restrict operator 
described in [6], using the slice of the receiving process as the restricting domain. 



6 Experimental Results 

In this section we report initial performance results of using our approach. We 
implemented our partitioned BDD and embedded it in an enhanced version of 
McMillan’s SMV [7], due to IBM Haifa Research Laboratory [2]. 

Our parallel testbed includes 32 RS6000 machines, each consisting of a 
225MHz PowerPC processor and 512MB memory. The communication between 
the nodes consists of a 16Mbit /second token ring. The nodes are non-dedicated; 
i.e., they are mostly workstations of employees who would often use them (and 
the network) at the same time that we ran our experiments. 

We experimented using five of the largest circuits we found in the benchmarks 
ISCAS89 -l-addendum’93. We also used two large examples (BIQ and ARB) 
which are components in the IBM’s Gigahertz processor. Characteristics of the 
seven circuits are given in Figure 5. 
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Fig. 5. Characteristics of our benchmark suit, taken from ISCAS89+addendum’93, and the IBM’s 
Gigahertz processor. All sizes are given in BDD nodes, and all times in seconds. Max reachable is 
the maximal (over the steps) set of nodes already reached. Max new is the maximal (over the steps) 
set of nodes reached but not yet developed. Note that new may be larger than reachable (at any 
step), since the joint BDD representation of the current-step’ new and the previous-step’ reachable 
may reduce in size. The peak is the maximal size at any point during a step. In order to mask the 
effect of garbage collection (gc) scheduling decisions, the peak is measured after every gc invocation. 
Fixed point is the number of steps/time it takes to get to fixed point. Ov(a:) means memory overflow 
at step X. The time was measured using an RS6000 machine, consisting of a 225MHz PowerPC 
processor with 512MB memory. 



6.1 Slicing Results 

The success of our slicing algorithm is a crucial factor in the efficiency of the 
parallel execution. This success is indicated by two parameters of the obtained 
partition: the redundancy, which is the ratio between the overall size of the 
slices and the original reachable size, and the memory reduction, which is the 
ratio between the original reachable size and the largest slice in the partition. 

Figure 4 presents the slicing results of reachable sets for four slicing methods. 
In order to show phenomena which appear only towards large number of slices, 
results in Figure 4 are given for 16, 32, 64, and 130 slices. The slicing algorithms 
are invoked when the size of the reachable set exceeds the threshold 100,000 
BDD nodes. 

The first method selects as a slicing function the variable which achieves the 
biggest memory reduction. In algorithm Select Var this corresponds to choosing 
a = 1. The second method is the same as that used in Cabodi et. al. [5]. This 
corresponds to choosing the splitting variable with the best fixed a. The third 
method is the one presented in Section 3, adapting a to select the partition with 
minimal redundancy. The fourth method includes the optimizations described 
in Section 4, so that splitting is carried using a general function. 

The table shows that the average increase in the memory reduction which can 
be attributed to our optimizations (adapting a and choosing a general splitting 
function), is 25%, 22%, 18%, and 10% for slicing into 130, 65, 32, and 16 parts, 
respectively. We conclude that these optimizations become more important as 
the level of slicing increases. This proves that the key to better slicing when 
the number of slices increases is to opt for lower duplication, which is the base 
orientation for our optimizations. 

The average memory reduction factor achieved over our benchmark suit for 
slicing into 130 slices, is more than 55. We expect the results to improve for 
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high slicing levels when a larger threshold is chosen. The reason for that is the 
threshold per slice, in our experiments 100,000/130 = 750 which may be too 
small. On the other hand, efficiency dictates earlier split when the bottleneck is 
the complexity of the slicing algorithm, or the resources required by the initial 
sequential stage. 



6.2 Parallel Reachability Space Reduction 

We now present the results for reachability analysis of the benchmark suit using 
our 32 machine testbed. Figures 6 to 12 summarize the memory usage, giving 
the reachable size and peak usage for every step. Each of the graphs compares 
the memory usage in the single-machine execution to that of the parallel system. 
For the parallel system we give both average and highest memory utilization in 
any of the machines. 
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Fig. 6. Memory utilizations during reachability analysis of prolog. 



The graphs show that scalability is obtained due to the performance of the 
slicing algorithm, which achieves a good memory reduction. The circuits which 
overflow always reach with the parallel execution to a farther step than when 
using the single-machine. Figure 11 shows the analysis process for circuit BIQ 
which safely reaches step 32 with the parallel execution. BIQ reaches only step 
22 with the single machine execution. 

One of the crucial factors in the success of the parallel reachability scheme is 
the dynamic memory balancing, which is in charge of maintaining the “accom- 
plishments” of the slicing algorithm. The ratio of worst to average space usage 
in the graphs indicates that our dynamic load balancing algorithm succeeds to 
avoid extreme imbalance. 

Note that the measures on peak size are subject to the gc scheduling policy, 
thus the phenomena appearing e.g., in Figure 7, where the reachable set shrinks 
while the peak remains very high. 
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6.3 Parallel Reachability Timing and Communication 

Figure 13 gives the timing breakdown for reachability analysis on the bench- 
mark suit. This table provides information regarding the ratio of computation 
(compute) to communication (exchange) and memory balancing (balance) in 
our scheme. The table shows that the overall picture is fairly balanced. In other 
words, the table shows that communication is not a bottleneck in our algorithm, 
despite the fact that we use a relatively slow network. 

7 Comments on Related Work 

In this section we discuss the improvements of our algorithm over the reachability 
analysis algorithms presented in [5,8]. Their algorithms slice the computation, 
but do not parallelize it. We also summarize the special consideration needed by 
parallel implementation. 

The slicing suggested in [5] is more general than that in [8]. However, as 
shown in Section 6 above, its effectiveness is limited to a small number of slices. 
The adaptive a and the general slicing functions used by our algorithm proved 
scalability and worked well even with 130 slices. Experimental results show that 
the the impact of our optimizations increases with the level of slicing. 

Balancing the memory requirements among slices during computation increa- 
ses the overall reduction. The algorithm in [8] does not include any balancing. 
The balancing suggested by [5] constantly increases the number of slices. Our ba- 
lancing method keeps the number of slices hxed, while successfully maintaining 
the work balanced. This is important when the network size is hxed. 

At the end of each step, [5] and [8] write to the disk the sets reachable and 
new, obtained for the slice under consideration. Rather, we send on the network 
(which is much faster) only the non-owned part of new, which is relatively small. 

In [8] the authors comment that they believe their algorithm can be paralle- 
lized. This, however, is not immediate. In order to exploit the full power of the 
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Fig. 7. Memory utilizations during reachability analysis of sl269. 





Achieving Scalability in Parallel Reachability Analysis of Very Large Circuits 33 



(a) Size of reachable states set 

Fig. 8. Memory utilizations during reachability analysis of s3330. 



(a) Size of reachable states set (b) Nodes allocated (peak) 

Fig. 9. Memory utilizations during reachability analysis of sl423. 



(a) Size of reachable states set (b) Nodes allocated (peak) 

Fig. 10. Memory utilizations during reachability analysis of s5378. 
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Fig. 11. Memory utilizations during reachability analysis of BIQ. 
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Fig. 12. Memory utilizations during reachability analysis of ARB. 
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Fig. 13. Timing data (seconds) for parallel execution on 32x512MB machines. Each of the mea- 
sures is the worst sample over all the machines. The steps count shows that in the case of overflow 
we got farther from where the 512MB single-machine experiment gave up (given in brackets). The 
sequential stage shows the time it took to get to the threshold where slicing is invoked. The total 
parallel is the total time over all steps, including computing, exchanging non-owned states, memory 
balancing, and garbage collection time. Note that the total time is the maxima over sums and not 
the sum over maxima. Note that communication time is counted only in the exchanging non-owned 
and balancing columns. 
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parallel machinery, we had to adapt the BFS for asynchronous computation, we 
coordinated and minimized communications, avoided unnecessary blocking, and 
employed a distributed termination detection. 

Our system uses a powerful model checker, which shows that our scheme in- 
tegrates nicely with state of the art tools. As a result, we were able to experiment 
with very large circuits, reaching farther steps. For instance, [5] report overflow 
in step 4 of s5378, while our system reached fixed point at step 44. 
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Abstract. We develop an automata-theoretic framework for reasoning about 
infinite-state sequential systems. Our framework is based on the observation that 
states of such systems, which carry a finite but unbounded amount of informa- 
tion, can be viewed as nodes in an infinite tree, and transitions between states 
can be simulated by finite-state automata. Checking that the system satisfies a 
temporal property can then be done by an alternating two-way tree automaton 
that navigates through the tree. As has been the case with finite-state systems, 
the automata-theoretic framework is quite versatile. We demonstrate it by solving 
several versions of the model-checking problem for /i-calculus specifications and 
prefix-recognizable systems, and by solving the realizability and synthesis pro- 
blems for /i-calculus specifications with respect to prefix-recognizable environ- 
ments. 



1 Introduction 

One of the most significant developments in the area of formal design verification is the 
discovery of algorithmic methods for verifying temporal-logic properties of finite-state 
systems [CES86,LP85,QS81,VW86]. In temporal-logic model checking, we verify the 
correctness of a finite-state system with respect to a desired behavior by checking whether 
a labeled state-transition graph that models the system satisfies a temporal logic formula 
that specifies this behavior (for a survey, see [CGP99]). Symbolic methods that enable 
model checking of very large state spaces, and the great ease of use of fully algorithmic 
methods, led to industrial acceptance of temporal model checking [BBG'*'94]. 

An important research topic over the past decade has been the application of model 
checking to infinite-state systems. Notable successes in this area has been the appli- 
cation of model checking to real-time and hybrid systems (cf. [HHWT95,LPY97]). 
Another active thrust of research is the application of model checking to infinite-state 
sequential systems. These are systems in which a state carries a finite, but unboun- 
ded, amount of information, e.g., a pushdown store. The origin of this thrust is the 
important result by Muller and Schupp that the monadic second-order theory of context- 
free graphs is decidable [MS85]. As the complexity involved in that decidability re- 
sult is nonelementary, researchers sought decidability results of elementary complexity. 

* Supported in part by NSF grant CCR-9700061, and by a grant from the Intel Corporation. 
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This started with Burkart and Steffen, who developed an exponential-time algorithm 
for model-checking formulas in the alternation-free ^-calculus with respect to context- 
free graphs [BS92], Researchers then went on to extend this result to the /r-calculus, 
on one hand, and to more general graphs on the other hand, such as pushdown gra- 
phs [BS99a,Wal96], regular graphs [BQ96], and prefix-recognizable graphs [Cau96]. 
The most powerful result so far is an exponential-time algorithm by Burkart for mo- 
del checking formulas of the /i-calculus with respect to prefix-recognizable graphs 
[Bur97b], See also [BCMS00,BE96,BEM97,BS99b,Bur97a,FWW97]. 

In this paper we develop an automata-theoretic framework for reasoning about 
infinite-state sequential systems. The automata-theoretic approach uses the theory of 
automata as a unifying paradigm for system specification, verification, and synthesis 
[WVS83,EJ91,Kur94,VW94,KVW00]. Automata enables the separation of the logical 
and the algorithmic aspects of reasoning about systems, yielding clean and asymptoti- 
cally optimal algorithms. The automata-theoretic framework for reasoning about finite- 
state systems has proven to be very versatile. Automata are the key to techniques such 
as on-the-fly verification [GPVW95], and they are useful also for modular verification 
[KV98], partial-order verification [GW94,WW96], verification of real-time and hybrid 
systems [HKV96,DW99], and verification of open systems [AHK97,KV99]. Many de- 
cision and synthesis problems have automata-based solutions and no other solution for 
them is known [EJ88,PR89,KV00]. Automata-based methods have been implemented 
in industrial automated-verification tools (c.f., COSPAN [HHK96] and SPIN [Hol97, 
VB99]). 

The automata-theoretic approach, however, has long been thought to be inapplicable 
for effective reasoning about infinite- state systems. The reason, essentially, lies in the 
fact that the automata-theoretic techniques involve constructions in which the state space 
of the system directly influences the state space of the automaton (e.g., when we take the 
product of a specification automaton with the graph that models the system). On the other 
hand, the automata we know to handle have finitely many states. The key insight, which 
enables us to overcome this difficulty, and which is implicit in all previous decidability 
results in the area of infinite-state sequential systems, is that in spite of the somewhat 
misleading terminology (e.g., “context-free graphs” and “pushdown graphs”), the classes 
of infinite-state graphs for which decidability is known can be described by finite-state 
automata. This is explained by the fact the the states of the graphs that model these 
systems can be viewed as nodes in an infinite tree and transitions between states can 
be expressed by finite-state automata. As a result, automata-theoretic techniques can be 
used to reason about such systems. In particular, we show that various problems related 
to the analysis of such systems can be reduced to the emptiness problem for alternating 
two-way tree automata, which was recently shown to be decidable in exponential time 
[Var98]. 

We first show how the automata-theoretic framework can be used to solve the /i- 
calculus model-checking problem with respect to context-free and prefix-recognizable 
systems. While our framework does not establish new complexity results for model 
checking of infinite-state sequential systems, it appears to be, like the automata- 
theoretic framework for finite-state systems, very versatile, and it has further potential 
applications. We demonstrate it by showing how the ^-calculus model-checking algo- 
rithm can be extended to graphs with regular state properties, to graphs with regular 
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fairness constraints, to /^.-calculus with backwards modalities, and to checking realiz- 
ability of /i-calculus formulas with respect to infinite-state sequential environments. In 
each of these problems all we have to demonstrate is a (fairly simple) reduction to the 
emptiness problem for alternating two-way tree automata; the (exponentially) hard work 
is then done by the emptiness-checking algorithm. 



2 Preliminaries 



2.1 Labeled Rewrite Systems 

A labeled transition graph is quadruple G = {S, Act, p, sf) , where S' is a (possibly 
infinite) set of states. Act is a finite set of actions, p G S x Act x S is a labeled 
transition relation, and sq G Sq is an initial state. When p{s,a,s'), we say that s' is 
an a-successor of s, and s is an a-predecessor of s' . For a state s G S, we denote by 
G® = {S, Act, p, s) , the graph G with s as its initial state. A rewrite system is a quadruple 
TZ = {V, Act, R, xq), where L is a finite alphabet, Act is a finite set of actions, R maps 
each action a to a finite set of rewrite rules, to be defined below, and xq G V* is an 
initial word. Intuitively, R{a) describes the possible rules that can be applied by taking 
the action a. We consider here two types of rewrite systems. In a context-free rewrite 
system, each rewrite rule is a pair {A, x) G V x L*. In a prefix-recognizable rewrite 
system, each rewrite rule is a triple (a, /3, 7 ) of regular expressions over V, each defining 
a subset of V*. We refer to rewrite rules in R{a) as a-rules. 

The rewrite system TZ induces the labeled transition graph 



Gn = {V*,Act,pn,xo) , 



where (x,a,y) G p-jz if there is a rewrite rule in R{a) whose application on x results in y. 
Formally, if 7?. is a context-free rewrite system, thenp 7 ^(A-y, a,x-y) if {A, x) G R(a). If 
7?. is a prefix-recognizable system, then pyz(z ■y,a,x-y) if there are regular expressions 
a, (3, and 7 such that z G a, y G (3, x G and {a, (3, 7 ) G R{a). A labeled transition 
graph that is induced by a context-free rewrite system is called a context-free graph. A 
labeled transition system that is induced by a prefix-recognizable rewrite system is called 
a prefix-recognizable graph. Note that in order to apply an a-transition in state x of a 
context-free graph, we only need to match the first latter of x with the first element of an 
a-rule. On the other hand, in an application of an a-transition in a prefix-recognizable 
graph, we should find an a-rule and a partition of x to a prefix that belongs to the first 
element of the rule and a suffix that belongs to the second element. 

a a a 

Example 1. The context-free rewrite sy- ^ ^ ab ^abb ^abbb 

stem {{A,B},{a,b},R,A), with R{a) = b| b| b| b| 

{(A, AB}} and R{b) = {(A, e), {B, e)}, in- 
duces the labeled transition graph on the right. 



BB 



■ BBB 



We define the size |7?| of 7? as the space required in order to encode the rewrite rules 
in R. Thus, in the case of a context-free rewrite system, 



i«i = E E 



a^Act (A,x)^R{a) 
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and in a prefix-recognizable rewrite system, 

1^1= E E \u^\ + m + \u^\, 

a^Act (a,/3,7)G-R(a) 

where \Ur \ is the size of a nondeterministic automaton provided for the regular expres- 
sion r. 

2.2 /x-Calculus 

The fx-calculus is a modal logic augmented with least and greatest fixpoint operators 
[Koz83]. Given a finite set Act of actions and a finite set Var of variables, a /t-calculus 
formula (in a positive normal form) over Act and Var is one of the following: 

- true, false, or y for all y G Var\ 

- ffx /\tf 2 or y’l V (/? 2 , for /i-calculus formulas Lpi and ip 2 \ 

- \a]tf or {a)(p, for a G Act and a /i-calculus formula (/?; 

- lay -ip or vy.tp, for y G Var and a /i-calculus formula ip. 

A sentence is a formula that contains no free variables from Vor (that is, all the 
variables are in a scope of some fixed-point operator). We define the semantics of /i- 
calculus with respect to a labeled transition graph G = (S', Act, /?, Sq) and a valuation 

V : Var 2^ for its free variables. Each formula ip and valuation V then define a set 
ip^CV) of states of G that satisfy the formula. For a valuation V, a variable y G Vor, 
and a set S' C S, we denote by V[y ^ S'] the valuation obtained from V by assigning 
S' to y. The mapping is defined inductively as follows: 

- true'^(V) = S and false*^(V) = 0; 

- For y G Vor, we have y'^(V) = V{y); 

- (V'l A V’2)'^(V) = tpi(y) n V’^(v); 

- (V'l V V’2)'^(V) = V'f (V) u V’f (V); 

- ([ajV')'^(V) = {s G S : for all s' such that R{s, a, s'), we have s' G 

- ((o)V')‘^(V) = {s G S : there is s' such that R{s, a, s') and s' G 

- iiay-i^AiV) = C S : i;^{V[y ^ S']) C S'}; 

- (lay-i^fiV) = U{^' C S : S' C i;G{V[y ^ S'])}. 

Note that 'ip^ cares only about the valuation of free variables in -p. In particular, no 
valuation is required for a sentence. For a state s G S and a sentence ip, we say that ip 
holds at s in G, denoted G, s ^ V' iff s G ip^. Also, G ]= V’ iff G, sq \= ip. 

2.3 Alternating Two-Way Automata 

Given a finite set T of directions, an T -tree is a set T C T* such that if i; • a; G T, where 

V G T and x G T*, then also x G T. The elements of T are called nodes, and the empty 
word £ is the root of T. For every v G T and x G T, the node x is the parent of v ■ x. 
Each node x ^ e of T has a direction in T. The direction of the root is the symbol _L 
(we assume that _L ^ T). The direction of a node v ■ x is v. We denote by dir{x) the 
direction of node x. An T-tree T is a full infinite tree if T = T*. A path tt of a tree T 
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is a set 7T C T such that e € n and for every x € tt there exists a unique v € T such 
that V ■ X G TT. Note that our dehnitions here reverse the standard definitions (e.g., when 
T = {0, 1}, the successors of the node 0 are 00 and 10 (rather than 00 and 01)'. 

Given two finite sets T and E, a S -labeled T-tree is a pair (T, V) where T is an 
T-tree and V : T ^ E maps each node of T to a letter in E. When T and E are 
not important or clear from the context, we call (T, V) a labeled tree. We say that an 
((T U {-L}) X Il’)-laheled T-tree (T, V) is T -exhaustive if for every node a; G T, we 
have V{x) G {dir{x)} x E. 

Alternating automata on infinite trees generalize nondeterministic tree automata and 
were hrst introduced in [MS87]. Here we describe alternating two-way tree automata. 
For a hnite set X, let (X) be the set of positive Boolean formulas over X (i.e., boolean 

formulas built from elements in X using A and V), where we also allow the formulas 
true and false, and, as usual, A has precedence over V. For a set T C X and a formula 
9 G B^{X), we say that Y satisfies 9 iff assigning true to elements in Y and assigning 
false to elements in X \ F makes 9 true. For a set T of directions, the extension of T is 
the set ext{T) = Y U {s, f} (we assume that T fl {e, f} = 0)- An alternating two-way 
automaton over F-labeled T -trees is a tuple A = (T, Q, S, qo,F), where E is the input 
alphabet, Q is a hnite set of states, S : Q x E ^ B^{ext(T) x Q) is the transition 
function, qo G Q is an initial state, and F specihes the acceptance condition. 

A run of an alternating automaton A over a labeled tree (T* , F) is a labeled tree 
{Tr, r) in which every node is labeled by an element of T* x Q. A node in Tr, labeled 
by {x,q), describes a copy of the automaton that is in the state q and reads the node x 
of T*. Note that many nodes of can correspond to the same node of T*; there is no 
one-to-one correspondence between the nodes of the run and the nodes of the tree. The 
labels of a node and its successors have to satisfy the transition function. Formally, a run 
{Tr, r) is a T^-labeled F -tree, for some set F of directions, where Er = T* x Q and 
{Tr, r) satishes the following: 

1. £ GTr and r(e) = (e, go)- 

2. Consider y G Tr with r{y) = (x, q) and S{q, V (x)) = 9. Then there is a (possibly 
empty) set S C ext{T) x Q, such that S satishes 9, and for all (c, q') G S, there is 
7 G T such that y ■ y G Tr and the following hold: 

- If c G T, then r{j ■ y) = {c ■ x, q'). 

- lfc=£, then r (7 • y) = {x, q'). 

- If c =t, then X = V ■ z, for some v G Y and 2 G Y*, and r{-y ■ y) = {z, q'). 

Thus, £-transitions leave the automaton on the same node of the input tree, and f- 
transitions take it up to the parent node. Note that the automaton cannot go up the root 
of the input tree, as whenever c =f, we require that x e. 

A run {Tr, r) is accepting if all its inhnite paths satisfy the acceptance condition. We 
consider here parity acceptance conditions [EJ91]. A parity condition over a state set Q 
is a hnite sequence F = {Fi, F 2 , . . . , Fm} of subsets of Q, where Fi C F 2 C . . . C 
Fm = Q- The number m of sets is called the index of A. Given a run {Tr, r) and an 
inhnite path tt Q Tr, let m/(7r) C Q be such that q G inf{Tr) if and only if there are 

* As will get clearer in the sequel, the reason for that is that rewrite rules refer to the prehx of 
words. 
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infinitely many y G ir for which r{y) G T* x {g}. That is, inf{n) contains exactly all 
the states that appear infinitely often in tt. A path tt satisfies the condition F if there is 
an even i for which m/(7r) fl 0 and m/(7r) fl Fi_i = 0. An automaton accepts a 
labeled tree if and only if there exists a run that accepts it. We denote by C{A) the set 
of all F-labeled trees that A accepts. The automaton A is nonempty iff C{A) ^ 0. 

Theorem 1. Given an alternating two-way parity tree automaton A with n states and 
index k, we can construct an equivalent nondeterministic one-way parity tree automaton 
whose number of states is exponential in nk and whose index is linear in nk [Var98], 
and we can check the nonemptiness of A in time exponential in nk [EJS93]. 

2.4 Alternating Automata on Labeled Transition Graphs 

Consider a labeled transition graph G = (S', Act, p, sf). For the set Act of actions, let 
next(Act) = {e} U Uoe Act{[®]> (®)i- alternating automaton on labeled transition 
graphs (graph automaton, for short) [JW95]^ is a tuple S = {Act, Q, S, qo,F), where 
Q, qo, and F are as in alternating two-way automata. Act is a set of actions, and S : 
Q -G {next{Act) x Q) is the transition function. Intuitively, when S is in state q 
and it reads a state s of G, fulfilling an atom ((a), t) (or {a)t, for short) requires S to 
send a copy in state t to some a-successor of s. Similarly, fulfilling an atom [a]f requires 
S to send copies in state t to all the a-successors of s. Thus, like symmetric automata 
[DW99,Wil99], graph automata cannot distinguish between the various a-successors of 
a state and treat them in an existential or universal way. 

Like runs of alternating two-way automata, a run of a graph automaton S over a 
labeled transition graph G = {S, Act, p, sq) is a labeled tree in which every node is 
labeled by an element of S' x Q. A node labeled by {s,q), describes a copy of the 
automaton that is in the state q of S and reads the state s of G. Formally, a run is a 
Fr -labeled F-tree (T^, r), where F is an arbitrary set of directions, Sr = S x Q, and 
{Tr, r) satisfies the following: 

1 . sGTr and r(e) = (so,<?o)- 

2. Consider y G Tr with r{y) = (s, q) and S{q) = 9. Then there is a (possibly empty) 
set S C next(Act) x Q, such that S satisfies 9, and for all (c, q') G S, the following 
hold: 

- If c= e, then there is 7 G F such that y ■ y G Tr and r{y ■ y) = (s, q'). 

- If c = [a] , then for every a-successor s' of s, there is 7 G F such that y-yGTr 
and r{yy) = {s',q'). 

- If c = (a), then there is an a-successor s' of s and 7 G F such that y ■ y G Tr 
and r{yy) = {s',q'). 

A run {Tr, r) is accepting if all its infinite paths satisfy the acceptance condition. The 
graph G is accepted by S if there is an accepting run on it. We denote by £(5) the set 
of all graphs that S accepts. We denote by 5^ = {Act, Q, 6, q, F) the automaton S with 
q as its initial state. 

^ The graph automata in [JW95] are different than these defined here, but this is only a technical 
difference. 
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We use graph automata as our specification language. We say that a labeled transition 
graph G satisfies a graph automaton S, denoted G ^ 5, if 5 accepts G. It is shown in 
[JW95] that graph automata are as expressive as p-calculus. In particular, we have the 
following. 

Theorem 2. Given a ^-calculus formula of length n and alternation depth k, we 
can construct a graph parity automaton Sy, such that C{S^) is exactly the set of graphs 
satisfying f). The automaton Sy, has n states and index k. 



3 Model Checking of Context-Free Graphs 

In this section we present an automata-theoretic approach to model-checking of context- 
free transition systems. Consider a labeled transition graph G = {V* , Act, pn^vf), 
induced by a rewrite system TZ = {V, Act, R, xq). Since the state space of G is the full 
C-tree, we can think of each transition {z, a, z') G pn as a “jump” that is activated by 
the action a from the node z of the C-tree to the node z' . Thus, if 7^ is a context-free 
rewrite system and we are at node A-yof the V -tree, an application of the action a takes 
us to nodes x ■ y, for {A, x) € i?(o). Technically, this means that we first move up to the 
parent y of A ■ y, and then move down along x. Such a navigation through the C-tree 
can be easily performed by two-way automata. 

Theorems. Given a context-free rewrite system TZ = {V, Act, R,vq) and a graph 
automaton S = {Act,Q,S,qo,F), we can construct an alternating two-way parity 
automaton A over (V U {l.})-labeled V -trees such that C{A) is not empty iff Gn 
satisfies S. The automaton A has 0(|(5| • |i?| • |y|) states, and has the same index as S. 

Proof: The automaton A checks that the input tree is V -exhaustive (that is, each node 
is labeled by its direction). As such, A can learn from labels it reads the state in V* that 
each node corresponds to. The transition function of A then consults the rewrite rules in 
R in order to transform an atom in next (Act) x Q to a chain of transitions that spread 
copies of A to the corresponding nodes of the full V -tree. 

We define .4 = (^ U {_L}, Q', rj, Qq, F') as follows. 

- Q' = Q X tails{TZ) x (y U {_L, #}), where tails{TZ) C V* is the set of all suffixes 
of words X G V* for which there are a G Act and A G V such that {A, x) G R{a). 
Intuitively, when A visits a node x gV* in state {q, y. A), it checks that Gn with 
initial state y ■ xis accepted by 5^. In particular, when y = e, then G-ji with initial 
state X (the node currently being visited) needs to be accepted by 5^ . In addition, 
if A then A also checks that dir{x) = A. States of the form {q, e. A) are 
called action states. From these states A consults 6 and R in order to impose new 
requirements on the exhaustive 1^-tree. States of the form {q, y. A), for y G V^, are 
called navigation states. From these states A only navigates downwards y to reach 
new action states. On its way, A also checks the V -exhaustiveness of the input tree. 

- In order to define y : Q' x (V U {-L}) —>■ B'^{ext{V) x Q'), we first define the 
function apply : next{Act) x Q x (y U {-L}) — >• B'^{ext(V) x Q'). Intuitively, 
apply transforms atoms participating in S, together with a letter A G V U {-L}, 
which stands for the direction of the current node, to a formula that describes the 




An Automata-Theoretic Approach to Reasoning about Infinite-State Systems 



43 



requirements on Gtz when the rewrite rules in R are applied to words of the form 
A ■ V*. For c G next(Act), q G Q, and A G V U {-L}, we define 



apply r{c, q,A) 



'{e,{q,e,A)) lfc=e. 

A(A.y)eR(a)(t,(9,2/,#)) Ifc=[a]. 
y{A,y)eR{a)it,{q,y,#)} Ifc=(a). 



Note that R{a) may contain no pairs in {A} x V* (that is, the transition relation of 
Gr may not be total). In particular, this happens when A = 1. (that is, the state e of 
Gtz has no successors). Then, we take empty conjunctions as true, and take empty 
disjunctions as false. 

In order to understand the function apply r, consider the case c = [a] . When S reads 
the state A ■ x of the input graph, fulfilling the atom [ajq' requires S to send copies 
in state q to all the a-successors of ^ • x. The automaton A then sends to the node x 
copies that check whether all the states y ■ x, with ptz{A ■ x,a,y ■ x), are accepted 
by S with initial state q. 

Now, for a formula 0 G {next(Act) x Q) , the formula 



applyn{0,A) G B^{ext(y) x Q') 



is obtained from 9 by replacing an atom (c, q) by the atom apply r{c, q, A). We can 
now define p for all Gl G y U {_L} as follow. 

- v{{q,£,A),A) = y{{q,e,#),A) = applyR{S{q),A). 

- V{{q, B -y,A),A) = T]{{q, B -y,#),A) = {B, {q, y, B)). 

Thus, in action states, A reads the direction of the current node and applies the rewrite 
rules of TZ in order to impose new requirements according to 5. In navigation states, 
A needs to go downwards B ■ y and check that the nodes it comes across on its way 
are labeled by their direction. For that, A proceeds only with the direction of the 
current node (maintained as the third element of the state), and sends to direction B 
a state whose third element is B. Note that since we reach states with ^ only with 
upward transitions, A visits these states only when it reads nodes x that have already 
been read by a copy of A that does check whether x is labeled by its direction. 

- (7g = (go, a^o, -L)- Thus, in its initial state A checks that Gr, with initial state xq is 
accepted by S with initial state go ■ It also checks that the root of the input tree is 
labeled with _L. 

- F' is obtained from F by replacing each set F) by the set F) x tails (i?)x(yu{#}). 

□ 



Context-free rewrite systems can be viewed as a special case of prefix-recognizable 
rewrite systems. In the next section we describe how to extend the construction described 
above to prefix-recognizable graphs, and we also analyze fhe complexify of fhe model- 
checking algorifhm fhat follows for fhe two types of systems. 



4 Model Checking of Prefix- Recognizable Graphs 

In this section we extend the construction described in Section 3 to prefix-recognizable 
fransifion systems. The idea is similar: fwo-way automate can navigate through the full 
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V -tree and simulate transitions in a system induced by a rewrite system by a chain of 
transitions in the tree. While in context-free transition systems the application of rewrite 
rules involved one move up the tree and then a chain of moves down, here things are a 
bit more involved. In order to apply a rewrite rule (a, (3, 7 ), the automaton has to move 
upwards along a word in a, check that the remaining word leading to the root is in f3, 
and move downwards along a word in 7 . As we explain below, A does so by simulating 
automata for the regular expressions participating in R. 

Theorem 4. Given a prefix-recognizable rewrite system TZ = {V, Act, R,vq) and a 
graph automaton S = {Act,Q,6,qo,F), we can construct an alternating two-way 
parity automaton A over (V U {J-})-labeled V -trees such that C{A) is not empty ijf 
G-jz satisfies S. The automaton A has 0(|<5| • |i?| ■ |V^|) states, and has the same index 
as S. 

Proof: For a regular expression a on V, let Ua = {V, Sa, Ma, S^, Fa) be a non- 
deterministic word automaton with C{Ua) = a- Let 17 = {{a,fi,fi) : there is a G 
Act such that {a, /3, 7 ) G 7?(a)} be the set of all triples in R{a), for some a G Act, and 
let ^ 5^ U U S'.y be the union of all the state spaces of the automata 

associated with regular expressions that participate in R. 

As in the case of context-free rewrite systems, A checks that the input tree is the 
V -exhaustive tree and then uses its labels in order to learn the state in V* that each node 
corresponds to. As there, A applies to the transition function <5 of 5 the rewrite rules of 
TZ. Here, however, the application of the rewrite rules on atoms of the form {a)q and 
[a]g is more involved, and we describe it below. Assume that A wants to check whether 
S'^ accepts G^, and it wants to proceed with an atom {a)q in S{t). The automaton A 
needs to check whether 5^ accepts G^ for some state y reachable from x by applying 
an a-rule. That is, a state y for which there is {a, (3, 7 ) G R{a) and partitions x' ■ z and 
y' ■ z, of X and y, respectively, such that x' is accepted by Ua, z is accepted by Uy, 
and is y' accepted by U.y. The way A detects such a state y is the following. From the 
node X, the automaton A simulates the automaton Ua upwards (that is, A guesses a run 
of Ua on the word it reads as it proceeds on direction f from x towards the root of the 
V -tree). Suppose that on its way up to the root, A encounters a state in Fa as it reads the 
node z € V*. This means that the word read so far is in a, and can serve as the prefix 
x' above. If this is indeed the case (and A may also continue as if a state in Fa has not 
been encountered; thus guess that the word read so far is not x'), then it is left to check 
that the word z is accepted by Uy, and that there is a state that is obtained from z by 
prefixing it with a word y' G 7 that is accepted by S'^. To check the first condition, A 
sends a copy in direction f that simulates a run of Uy, hoping to reach a state in Fy as it 
reaches the root (that is, A guesses a run of Uy on the word it reads as it proceeds from z 
up to the root of the H-tree). To check the second condition, A simulates the automaton 
Uj downwards. A node y' ■ z €V* that A reads as it encounters a state in F~^ can serve 
as the state y we are after. The case for an atom [a] q is similar, only that here A needs to 
check whether 5* accepts G^ for all states y reachable from x by applying an a-rule, 
and thus the choices made by A for guessing the partition x' ■ z of x and the prefix y' of 
y are now treated dually. 

In order to follow the above application of rewrite rules, the state space of A is 
Q' = Q X f2 X So 'x {0,1, 2, 3} x (V, 3} x (H U |_L,#}). Thus, a state is a 6 - 
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tuple q' = {q, {a, P,j) , s,i,b, A), where A is the expected direction of the current 
node (needed in order to check the F-exhaustiveness), i G {0, 1, 2, 3} is the current 
simulation mode (states in mode 0 are action states, where we apply 7Z on the transitions 
in (5, and states in modes 1, 2 and 3 are states where we simulate automata for a, /?, and 7, 
respectively), b G {V, 3} is the simulating mode (depending on whether we are applying 
7Z to an (a) or an [a] atom), (a, (3, 7 ) is the rewrite rule in R{a) we are applying, and s 
is the current state of the simulated automaton^. The formal definition of the transition 
function of A follows quite straightforwardly from the definition of the state space and 
the explanation above. 

The acceptance condition of A is the adjustment of F to the new state space. That is, 
it is obtained from F by replacing each set T) by the set Fi x 37 x Sq x {0} x {V, 3} x 
(V U {_L, #}). Considering only action states excludes runs in which the simulation of 
the automata for the regular expressions continues forever. Indeed, as long as a copy 
of A simulates an automaton Ua, U^, or U^, it stays in simulation mode 1 , 2 , or 3, 
respectively. □ 

The constructions described in Theorems 3 and 4 reduce the model-checking pro- 
blem to the nonemptiness problem of an alternating two-way parity tree automaton. By 
Theorem 1, we then have the following. 

Theorem 5. The model-checking problem for a context-free or a prefix recognizable 
rewrite system TZ — {V, Act, R, vq) and a graph automaton S = {Act, Q, 5, qo, F), can 
be solved in time exponential in nk, where n = \Q\ ■ |i?| • \V\ and k is the index ofS. 

Together with Theorem 2, we can conclude with an EXPTIME bound also for the 
model-checking problem of /r-calculus formulas matching the lower bound in [Wal96] . 
Note that the fact the same complexity bound holds for both context-free and prefix- 
recognizable rewrite systems stems from the different definition of |i?| in the two cases. 



5 Extensions 

The automata-theoretic approach offers several extensions to the model-checking setting. 
We describe some of these extensions below. 



5.1 Regular State Properties 

The systems we want to reason about often have, in addition to a set of actions, also 
a set P of state properties. In the case of finite-state systems, these are described by a 
mapping L : S ^ P that associates with each state of the labeled transition graph that 
models the system, the property that is true in it (for simplicity, we assume that exactly 
one property holds in each state). In our case, of infinite-state graphs induced by rewrite 
systems, we consider regular state properties, where each property p G P is associated 

^ Note that a straightforward representation of <5 results in 0(|Q| • |17| • \R\ ■ |E|) states. Since, 
however, the states of the automata for the regular expressions are disjoint, we can assume that 
the triple in f? that each automaton corresponds to is uniquely defined from it. 
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with a regular expression \p] over V, describing the set of states (words in V*) in which 
p holds. Again, we assume that for each x G V*, there is a single p G P such that 

X G [p\. 

In order to specify behaviors of labeled transition graphs with regular state pro- 
perties in P, we consider an extension of graph automata with the alphabet P. The 
transition function of an extended automaton S = (P, Act, Q,S,qo,F),i&S : Q x P ^ 

{next{Act) x Q); thus breads from the input graph both the state properties, in order 
to know with which transition to proceed, and the actions, in order to know to which 
successors to proceed. The formal definition of a run of an extended graph automaton 
on a labeled transition graph with state properties is the straightforward extension of the 
definition given in Section 2.4 for the graph automata described there. Alternatively, one 
can consider a /i-calculus with both state properties and actions [Koz83]. Theorem 2 
holds also for formulas in such a /i-calculus. 

Having our solution to the model-checking problem based on two-way automata, it 
is simple to extend it to graphs and specifications with state properties. Indeed, whenever 
the automaton A from Theorems 3 and 4 reads the state x G V* and takes a transition 
from an action state, it should now also guess the property p that holds in x and proceed 
according to the transition function of the specification automaton with input letter p. 
In order to check that the guess x G [p] is correct, the automaton simulates the word 
automaton U\p\ upwards, hoping to visit an accepting state when the root is reached. 
The complexity of the model-checking algorithm stays the same. 



5.2 Fairness 

The systems we want to reason about are often augmented With fairness constraints. 
Like state properties, we can define a regular fairness constraint by a regular expression 
a, where a computation of the labeled transition graph is fair iff it contains infinitely 
many states in a (this corresponds to weak fairness; other types of fairness can be defined 
similarly). It is easy to extend our model-checking algorithm to handle fairness (that is, 
let the path quantification in the specification range only on fair paths'^): the automaton 
A can guess whether the state currently visited is in a, and then simulate the word 
automaton Ua upwards, hoping to visit an accepting state when the root is reached. 
When A checks an existential property, it has to make sure that the property is satisfied 
along a fair path, and it is therefore required to visit infinitely many states in a. When A 
checks a universal property, it may guess that a path it follows is not fair, in which case 
A eventually always send copies that simulate the automaton for -la. The complexity 
of the model-checking algorithm stays the same. 

5.3 Backward Modalities 

Another extension is the treatment of specifications with backwards modalities. While 
forward modalities express weakest precondition, backward modalities express strongest 

The exact semantics of fair graph automata as well as fair p-calculus is not straightforward, as 
they enable cycles in which we switch between existential and universal modalities. To make 
our point here, it is simpler to assume, say, graph automata that correspond to CTL* formulas. 
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postcondition, and they are very useful for reasoning about the past [LPZ85]. In order to 
adjust graph automata to backward reasoning, we add to next {Act) the “directions” {a~ ) 
and \a~]. This enables the graph automata to move to a-predecessors of the current state. 
More formally, if a graph automaton reads a state x of the input graph, then fulfilling an 
atom {a~)t requires S to send a copy in state t to some a-predecessor of x, and dually 
for [a~]t. Theorem 2 can then be extended to /i-calculus formulas and graph automata 
with both forward and backward modalities [Var98]. 

Extending our solution to graph automata with backward modalities is simple. Con- 
sider a node a; G C * in a prefix-recognizable graph. The a-predecessors of x are states y 
for which there is a rule {a, [3, 7 ) G R{a) and partitions x' ■ z and y' ■ z, of x and y, res- 
pectively, such that x' is accepted by U^,z is accepted by Ufj, and y' is accepted by Ua- 
Hence, we can define a mapping R~ such that ( 7 , /?, a) G R~ (a) iff (a, f3, 7 ) G R{a), 
and handle atoms {a~)t and [a“]f exactly as we handle {a)t and [a]t, only that for them 
we apply the rewrite rules in R~ rather than these in R. The complexity of the model- 
checking algorithm stays the same. Note that the simple solution relies on the fact that 
the structure of the rewrite rules in a prefix-recognizable rewrite system is symmetric 
(that is, switching a and 7 results in a well-structured rule), which is not the case for 
context-free rewrite systems^. 

5.4 Global Model Checking 

In the full paper we show that in addition to checking whether a system 7Z satisfies a 
specification S, we can compute the regular languages of all states satisfying S, thus 
we solve the global model-checking problem. For a rewrite system TZ and a regular 
language L, let post{L), post*{L), pre{L), and pre*{L) be the sets of states in G-jz 
that are immediate sucessors of the states in L, sucessors of the states in L, immediate 
predecessors of the states in L, and predecessors of the states in L, respectively. The 
predicates above can be viewed as specifications. Indeed, post(L) = {+)L, post*{L) = 
yy.L V {+)y, pre{L) = {—)L, and pre*{L) = yy.L V {—)y (in a /x-calculus with state 
predicates, where (+) and (— ) are the “next” and “previously” modalities). Hence, the 
algorithm can be used to compute successors and predecessors of regular state sets, and 
can be viewed as the automata-theoretic approach to the algorithms in [BEM97]. 

This observation is related to the work in [LS98], where bottom-up automata on 
finite trees are used in order to recognize sets of terms in Process Algebra. Given a term 
t, [LS98] shows that it is possible to define post* (t) as the solution of a regular equation. 
They conclude that post * (t) is a regular tree language, and similarly for post{t), pre{t), 
and pre*{t). 



^ Note that this does not mean we cannot model check specifications with backwards modalities in 
context-free rewrite systems. It just mean that doing so invovles rewrite rules that are no longer 
context free. Indeed, a rule {A, x) G R{a) in a context-free system corresponds to the mle 
{A, V ,x) £ R{a) in a prefix recognizable system, inducing the rule (x,V ,A) £ R '^(a). 
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6 Realizability and Synthesis 



Given a rewrite system TZ = (V, Act^ R, Vq), a strategy of 7^ is a function f : V* ^ 
Act. The function / restricts the graph G-jz so that from a state x G V*, only f{x)- 
actions are taken. Formally, TZ and / together define the graph G-jzj = {V * , Act ,p,vo), 
where p{x, a, y) iff f{x) = a and pu{x^ a, u)- Given TZ and a graph automaton S = 
{Act, Q, 6, <7o, F), we say that a strategy / of 7^ is winning for S iff G-jzj satisfies S. 
Given TZ and S, the problem of realizability is to determine whether there is a winning 
strategy of TZ for S. The problem of synthesis is then to construct such a strategy. 
The setting described here corresponds to the case where the system needs to satisfy a 
specification with respect to environments modeled by a rewrite system. Then, at each 
state, the system chooses the action to proceed with and the environment provides the 
rules that determine the successors of the state. Branching-time realizability of hnite- 
state systems can be viewed as a special case of our setting here, where for all actions 
a G Act, we have R{a) = {(e, V* ,A)}. Thus, from each state x G V*, we can apply 
an a-transitions to all the states A ■ x, for A G V. 

The automaton A from Theorem 4 can be modified to solve the realizability problem 
and to generate winning strategies. The idea is simple: a strategy f : V* ^ Act can 
be viewed as an ^ct-labeled F-tree. Thus, the realizability problem can be viewed as 
the problem of determining whether we can augment the labels of the F-labeled V- 
exhaustive tree by elements in Act, and accept the augmented tree in a run of A in 
which whenever A reads an action a G Act, it applies to the transition function of the 
specification graph automaton only rewrite rules in R(a). Hence the following theorem. 

Theorem 6. Given a prefix-recognizable rewrite system TZ = {V, Act, R,vo) and a 
graph automaton S = {Act,Q,6,qo,F), we can construct an alternating two-way 
parity automaton A over ((V^ U {-L}) x Act)-labeled V -trees such that C{A) contains 
exactly all the V -exhaustive trees whose projection on Act is a winning strategy ofTZ 
for S. The automaton A has 0(|(5| • |7?| • |V^|) states, and has the same index as S. 



Proof: Exactly as in Theorem 4, only that from an action state we proceed with the 
rules in R{a), where a is the ^cf-element of the letter we read. For example, in the case 
of a context-free rewrite system, we would have, for c G next(Act), q G Q, A G V, 
and a G Act (the new parameter to apply-j^, which is read from the input tree). 



applyR{c,q,A, a) 



{e,{q,e,A)) Ifc = £. 

A<A.!/)efl(a)(t,(9,2/,#)) Ifc=[a]. 

true If c = [b], for b a. 

V(A.j/)Gfl(a)(t,(9,2/,#)) Ifc=(a). 

false If c = (6), for b a. 



□ 

Let n = \Q\ ■ |i?| • \V\, let k be the index of S, and let S = {V \J {-L}) x Act. By 
Theorem 1 , we can transform A to a nondeterministic one-way parity tree automaton A' 
with states and index 0{nk). By [Rab69,Eme85], if A' is nonempty, there exists 

a Z'-labeled V -tree {V* , f) such that for all a G S, the set of nodes x G V* for 
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which f{x) = cr is a regular set. Moreover, the nonemptiness algorithm of A', which 
runs in time exponential in nk, can be easily extended to construct, within the same 
complexity, a deterministic word automaton Uj, over V such that each state of is 
labeled by a letter a G S, and for all x G V*, we have f{x) = a iff the state of 14_a 
that is reached by following the word x is labeled by cr. The automaton W 4 is then the 
answer to the synthesis problem. 

The construction described in Theorems 3 and 4 implies that the realizability and 
synthesis problem is in EXPTIME. Thus, it is not harder than in the satisfiability problem 
for the /i-calculus, and it matches the known lower bound [FL79]. Formally, we have 
the following. 

Theorem 7. The realizability and synthesis problems for a context-free or a prefix reco- 
gnizable rewrite system TZ = {V,Act,R,Vo) and a graph automaton S = 
{Act, Q, S, qo, F), can be solved in time exponential in nk, where n = \Q\ ■ |i?| • \V\, 
and k is the index of S. 

By Theorem 2, if the specification is given by a /r-calculus formula ip, the bound is 
the same, with n = \'f \ ■ |i?| • \ V\, and k being the alternation depth of 

7 Discussion 

The automata-theoretic approach has long been thought to be inapplicable for effective 
reasoning about infinite-state systems. We showed that infinite-state systems for which 
decidability is known can be described by finite-state automata, and therefore, the states 
and transitions of such systems can be viewed as nodes in an infinite tree and transitions 
between states can be expressed by finite-state automata. As a result, automata-theoretic 
techniques can be used to reason about such systems. In particular, we showed that 
various problems related to the analysis of such systems can be reduced to the emptin- 
ess problem for alternating two-way tree automata. Our framework achieves the same 
complexity bounds of known model-checking algorithms, and it enables several exten- 
sions, such as treatment of state properties, fairness constraints, backwards modalities, 
and global model checking. Our framework also provides a solution to the realizability 
problem. 

An interesting open problem is the extension of our framework to the linear paradigm. 
Since LTL formulas can be translated to automata, a simple extension of our framework 
to handle specifications in LTL is possible. Nevertheless, since our algorithm involves 
a translation of a two-way alternating automaton to a nondeterministic automaton, we 
would end up in a complexity that is at least exponential in the system, which is worst 
than known polynomial algorithms [EHRSOO]. 
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Abstract. We propose a new method for the verification of paramete- 
rized cache coherence protocols. Cache coherence protocols are used to 
maintain data consistency in multiprocessor systems equipped with lo- 
cal fast caches. In our approach we use arithmetic constraints to model 
possibly infinite sets of global states of a multiprocessor system with 
many identical caches. In preliminary experiments using symbolic mo- 
del checkers for infinite-state systems based on real arithmetics (HyTech 
[HHW97] and DMC [DP99]) we have automatically verified safety pro- 
perties for parameterized versions of widely implemented write-invalidate 
and write-update cache coherence policies like the Mesi, Berkeley, Illi- 
nois, Firefly and Dragon protocols [Han93]. With this application, we 
show that symbolic model checking tools originally designed for hybrid 
and concurrent systems can be applied successfully to a new class of 
infinite-state systems of practical interest. 



1 Introduction 

In a shared-memory multiprocessor system local caches are used to reduce me- 
mory access latency and network traffic. Each processor is connected to a fast 
memory backed up by a large (and slower) main memory. This configuration en- 
ables processors to work on local copies of main memory blocks, greatly reducing 
the number of memory accesses that the processor must perform during program 
execution. Although local caches improve system performance, they introduce 
the cache coherence problem: multiple cached copies of the same block of memory 
must be consistent at any time during a run of the system. A cache coherence 
protocol ensures the data consistency of the system: the value returned by a read 
must be always the last value written to that location (cf. [AB86, Han93, PD95]). 
Coherence policies can be described as finite state machines that specify the way 
a single cache reacts to read and write requests. As an example, let us consider 
a CC-UMA (Uniform-Memory- Access with local Caches model) multiprocessor 
system, i.e., a system in which all processors have a local cache connected to 
the main memory via a shared bus. In write-invalidate protocols, whenever a 
processor modifies its cache block a bus invalidation signal is sent to all other 
caches in order to invalidate their content. Instead, in write-update protocols a 
copy of the new data is sent to all caches that share the old data. 
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Due to the increasing complexity of hardware architectures, the development 
of automatic verification techniques is becoming a major goal to help disco- 
vering errors at an early stage of protocol design (see e.g. [CGH+93, MS91]). 
In particular, one of the main challenges in this area is to develop techni- 
ques for validating protocols for every possible number of processors (see e.g. 
[EN98, HQR99, PD95]). In this paper, drawing inspiration from recent works 
on verification of parameterized concurrent systems (e.g. [GS92, EN96, EN98, 
EN98b, EFM99, LHR97]) we propose a new method for the verification of pa- 
rameterized cache coherence protocols at the behavior (specification) level. As 
mentioned before, in this context a multiprocessor system can be modeled as a 
collection of many identical finite-state machines. As first step, we apply the fol- 
lowing abstraction: we keep track only of the number of caches in every possible 
protocol state. The resulting abstract protocol can be represented as a transition 
system with data variables ranging over positive integers. Thus, an abstract pro- 
tocol can be formally described as an Extended Finite State Machine (EFSM) 
[GK97]. Via this abstraction, we represent all symmetric global states (global 
state=collection of individual cache states) using a single EFSM-state. We use 
then arithmetic constraints to implicitly represent (potentially infinite) sets of 
EFSM-states (tuples of natural numbers). This way, we are able to represent 
safety properties independently from the number of processors, and we reduce 
the verification problem for parameterized cache coherence protocols to a re- 
achability problem for EFSMs. The last problem can be attacked using general 
purpose, infinite- state symbolic model checking methods defined for integers or 
real arithmetics (see e.g. [BGP97, BW98, Hal93, HHW97, DP99]). Following the 
general methodology we suggest in [DelOO, DP99], we apply efficient tools based 
on real arithmetics (thus, applying a relaxation from integers to reals during 
the analysis) to automatically check safety properties like data- consistency for 
snoopy, write-invalidate and write-update cache coherence protocols for GG-UMA 
multiprocessors [AB86, Han93, PD97]. 

More precisely, our contributions are as follows. We first show that parame- 
terized versions of a large class of cache coherence protocols can be formulated 
in terms of EFSMs. The class of EFSMs we consider is an extension of the 
broadcast protocols of Emerson and Namjoshi [EN98]. In order to model cohe- 
rence policies, e.g., like the Illinois protocol, without abstracting away properties 
that are crucial for their validation (see discussion in Section 5), we need to en- 
force global conditions that cannot be represented using broadcast protocols. To 
prove the adequacy of this encoding, we relate the EFSM model to the finite 
state machine model of cache coherence protocols proposed by Pong and Dubois 
[PD95] . Based on this idea, we define a general method for the validation of pa- 
rameterized coherence protocols. The method is based on invariant checking for 
the corresponding EFSMs via the backward reachability algorithm of [AG JT96] . 
In contrast to forward reachability, the algorithm of [AGJT96] is guaranteed 
to terminate for the subclass of EFSMs denoting the broadcast protocols of 
[EN98] under the additional hypothesis that the set of unsafe states is upward- 
closed [EFM99]. Safety properties can often be modeled as upward-closed sets 
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[AJ99]. We choose a symbolic representation of (potentially infinite) sets of sta- 
tes via arithmetic constraints. The constraint operations of variable elimination, 
satisfiability and entailment test can be used to implement a symbolic version 
for the algorithm of [ACJT96]. However, following [DP99], in order to obtain 
an efficient procedure we interpret the above mentioned constraint operations 
over reals instead that over integers. This relaxation technique is widely-used 
in integer programming and program analysis. As for other methods handling 
global conditions in parameterized systems (e.g. [ABJN99]) and other methods 
for infinite-state systems (e.g. [BGP97, BW98, DP99, HHW97]), the resulting 
procedure is a semi- algorithm that must be evaluated on practical examples. We 
give sufficient conditions for the termination of the resulting procedure. Specifi- 
cally, we show that the symbolic version of the abstract algorithm of [ACJT96] 
where sets of states are represented as arithmetic constraints is robust under 
the relaxation integer-reals (it always terminates solving the control reachabi- 
lity problem of [AJ99]) whenever: (a) the input EFSM is a broadcast protocol 
[EN98]; (b) the unsafe states are represented via a special class of constraints 
that denote upward-closed sets. This result seems to be a new application of 
general methods for proving the well-structuredness of infinite-state systems 
[ACJT96, AJ99, FS98]. We use two existing constraint-based model checkers 
that implement the symbolic backward reachability algorithm described above, 
namely HyTech [HHW97] (that provides efficient data structures) and DMC 
[DP99] (that provides built-in accelerations), to check several safety properties 
for the MESI, University of Illinois, Berkeley RISC, DEC Firefly and Xerox 
PARC Dragon protocols [AB86, PD95, Han93]. Though the termination of our 
method is guaranteed only for broadcast protocols, the preliminary results show 
that it performs well in practice. 

To our knowledge, this is the first time that general purpose symbolic model 
checkers for infinite-state systems working over arithmetical domains are used 
for verification of parameterized cache coherence protocols. With this applica- 
tion, we have shown that techniques developed in the last years for hybrid and 
concurrent systems can also be applied to a new class of infinite-state systems 
of practical interest. 

The HyTech and DMC code of the protocols together with the results of their 
analysis and links to download the tools is available on the web at the following 
address: http : //www. disi .unige . it/person/DelzannoG/protocol .html . 



2 The Finite State Machine Model 

According to [PD95, PD97, EN98], we limit ourselves to consider protocols con- 
trolling single memory blocks and single cache lines. Following [PD95], a cache 
coherence protocol for a multiprocessor system with k local caches Ci, ... ,Ck 
can be represented via the following finite state machine model. 

Local Machine. Each of the caches has the same finite set Q of states. The 
transitions of cache Ci may be guarded by global conditions that depend on the 
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state of the other caches. The global conditions from the perspective of Ci are 
represented via a predicate fi. As an example of global condition, let us fix a 
state q G Q. Then, we could let fi = true if only if in the current state of the 
system there exists a cache Cj {j ^ i) whose state is equal to q. Formally, the 
behavior of the cache Ci is represented as a finite system {Q, Si, Si, fi), where 
Q is the set of states. Si is the set of operations causing state transitions, fi : 
-G {true, false} is a predicate that represents the global conditions from the 
perspective of Ci, and Si defines the state transition Qx Si x {true, false} -G Q. 
The third component in the domain of Si is the guard for the transitions of Ci. 
As an example, let us fix a state q G Q, and an operation a G Si. Then, we 
could set Si{q, a, true) = q' to express that cache Ci can go from state q to state 
q' whenever fi is satisfied in the current global state. The previous definitions 
allow us to compose the machines of the individual caches C\, . . . ,Ck into a 
single global machine Mq- 

Global Machine. A global state G of Mg is defined as the composition of the 
states of the individual caches. Formally, Mg is a tuple (Qg, SF , Sg), where 
Qg = Q^, Sg = AiU. . .yjSk, T is the global characteristic predicate (/i, . . . , /fe), 
and Sg '. Qg x ^G Qg- The transition function Sg is defined as follows. 
Given a global state G = {qi, . . . ,qk), Sg{G,(j) = {q[, . . . ,q{) if and only if 
q'i = CT, fi{G)) for z : 1, . . . , fc. A run of Mg is a possibly infinite sequence 
of global states Gi,...,G„... where i5g(G„,(t) = G„+i for some a G S. We 
write G A G' to denote the existence of a run that goes from G to G'. 

Terminology. In the rest of the paper we use the state invalid to denote eit- 
her that the cache has no data or that its content has been invalidated. Ca- 
che coherence protocols implement the following basic operations from the per- 
spective of cache Ci (state{Ci) G Q denotes its current state): Read Miss, a 
read request is sent to C) and state{Ci) = invalid; Read Hit, a read request is 
sent to cache Ci and state(Ci) ^ invalid; Write Miss, a write request is sent 
to Ci and state{Ci) = invalid; Write Hit, a write request is sent to Ci and 
state{Ci) ^ invalid. According to the previous definitions, in the next section 
we give a brief description of a widely implemented snoopy, write-invalidate pro- 
tocol we selected as main case-study. 

2.1 The Illinois Protocol 

The University of Illinois protocol is a snoopy cache, write-invalidate, write-in 
coherence policy, originally proposed by Papamarcos and Patel [PP84] (see also 
[AB86, PD95]). The special feature is that caches can have exclusive copies 
of data. Bus invalidation signals are sent only for writes to shared data. The 
memory copy is updated using a write-back policy {replace operation). Formally, 
in addition to invalid, caches assume one of the following states: valid-exclusive, 
the cache has an exclusive copy of the data that is consistent with the memory 
such that a modification of its content requires no bus invalidation signal; shared, 
the cache has a copy of the data consistent with the memory and other caches 
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Fig. 1. Illinois protocol for 2 caches viewed from the perspective of cache Ci. 



may have copies of the data; dirty, the cache has a modified copy of the data, 
i.e., the data in main memory are obsolete and the content of the other caches 
is not valid. The operations are Ri (read), Wi (write), and Repi (replace) for 
i : 1, . . . , /c=number of caches. The characteristic predicates fi in Tm is used 
by cache Ci to decide whether or not to move from invalid to valid-exclusive. 
Formally, fi is defined as follows. 

/i(('Zi; 92 , ■ • ■ 5 9fc)) if and only if 3j i such that qj yf invalid 
The possible transitions from the perspective of cache Ci are as follows. 

Read Hit: no coherence action needs to be taken. 

Read Miss: if there exists a cache Cj whose state is dirty (i.e. fi = true), Cj 
supplies the missing block to Ci and updates the main memory. Both Ci 
and Cj end up in state shared. If there are shared or valid-exclusive copies 
in other caches (i.e. fi = true), Ci gets the missing block from one of the 
caches and all caches with a copy end up in state shared. If there is no cached 
copy (i.e. fi = false), Ci receives a valid- exclusive copy from main memory. 
Write Hit: if Ci is in state dirty, no action is taken. If Ci is in state valid- 
exclusive, its state changes to dirty (note: no invalidation signal is needed). 
If Ci is in state shared, its state changes to dirty and all remote copies must 
be invalidated. 

Write Miss: similar to Read Miss, except that all remote copies are invalidated 
and the state of Ci changes to dirty. 

Replace: if Ci is in state dirty, the data are written back to memory. 

For k = 2, the protocol from the perspective of cache Ci is shown in Fig. 1. 

Safety Properties. In this paper we limit ourselves to verification of safety pro- 
perty for data consistency [PD97]. As illustrated before, in the Illinois protocol 
the state shared indicates that a cache has a clean copy consistent with the me- 
mory and other caches copies, whereas the state dirty indicates that it has the 
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latest and sole copy. Thus, in this example there are two possible sources of data 
inconsistency: 

iNVi: a dirty cache co-exists with one or more caches in state shared; 

INV2: there are more than one dirty cache. 

Thus, all global states that satisfy conditions iNVi or INV2 are unsafe. As men- 
tioned before, we are interested in proving the protocol safe for every possible 
number of caches. For a fixed number of processors k and a given protocol P, 
let X{k) be the set of initial global states and U{k) be the set of unsafe global 
states. The parameterized reachability problem is defined as follows. 

3 A: > 1 . 3 Gi G X{k). 3 G2 G U{k) such that Gi 4 G2? 

If the previous statement is true for a given k' than the protocol is not correct, 
i.e., an unsafe state may be reached for a system configuration with k' caches. 

3 EFSMs for Parameterized Cache Coherence Protocols 

In order to check parameterized safety properties we apply the following abstrac- 
tion. Let Q be the set of cache states <71, . . . , g„, then 

we keep track only of the number Xi of processes in every state qi € Q. 

A global state G with k components (fc=number of caches) is mapped to a tuple 
of positive integers with n components (n=number of cache states). This way, 
all symmetric global states are clustered together into a single representation. 
Via this abstraction, we cannot prove properties of individual caches like ‘cache 
i and cache j cannot be in state dirty simultaneously’. However, we can still 
try to prove global properties like ‘two different caches cannot be in state dirty 
simultaneously’. This is the kind of properties we are interested in to prove that 
the protocol will not give inconsistent (wrt. the semantics of states) results. The 
behavior of an arbitrary number of caches can be described finitely as a set of 
linear transformations describing the effect of the actions on the counters as- 
sociated to the states in Q. For this purpose, we model the ‘abstract protocol’ 
as a single Extended Finite State Machine (EFSM) [CK 97 ], i.e., a finite auto- 
maton with data variables (ranging over integers) associated to the locations 
and with guarded linear transformations associated to the transitions. Formally, 
let Mg be the global machine {Q,Sg,P,Sg) associated to a protocol P, and 
let Q = {qi , . . . , g„}. We model Mg as an EFSM with only one location and 
n data variables {xi, . . . ,Xn) ranging over positive integers. For simplicity, we 
will always omit the location. The EFSM-states are tuples of natural numbers 
c = (ci, . . . , c„), where Ci denotes the number of caches with state qi € Q during 
a run of Mg- The transitions are represented via a collection of guarded linear 
transformations defined over the variables x = {x\, . . . , cc„) and x' = (x^, . . . , x(j) 
and having the following form 



G(x) T(x,x'), 
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where xi and a;' denote the number of caches in state qi respectively before 
and after the occurrence of the transition. The guard G(x) may be an arbitrary 
linear constraint over the variables x. However, in this paper we limit ourselves 
to constraints defined as D\ • x > b A D2 • x = c where D\ is an n x n matrix 
with 0, 1 coefficients, D2 is a diagonal n x n-matrix with 0, 1 coefficients, and 
b and c are vectors of integers. This type of guards allows us to handle both 
local and global conditions over the global states of Mg- For instance, consider 
again the function Tm of the Illinois protocol. Then, fi = false for some i can 
be expressed as xi > 1,0:2 = 0,...,a;„ = 0. For the sake of this paper, the 
transformation T(x, x') is defined as x' = M • x + c where M is an n x n-matrix 
with unit vectors as columns (i.e., there is exactly one non-zero coefficient = 1 in 
each column) . This way, we can represent the changes of states of the caches in 
the system (including the invalidation signals). Since the number of caches is an 
invariant of the system, we require the transformation to satisfy the condition 
x[ + . . . + = Xi + . . . + Xn- 

Remark 1 . Let us call additive constraint a system of linear inequalities having 
the form D - x> c where 79 is a matrix with 0, 1 coefficients, and c is a vector of 
positive integers. (Note: an additive constraint can be expressed as a conjunction 
of atomic formulas -I- ... -I- Xi^ > c where Xi^, . . . , Xi^ are distinct variables 
from X, and c is a positive integer.) When all guards of an EFSM are restricted to 
additive constraints, we obtain the subclass of broadcast protocols introduced in 
[EN98]. Thus, in broadcast protocols it is not possible to enforce gZo&aZ conditions 
that, e.g, require tests for zero (constants). 

We show next some general patterns we use to model protocol actions via guar- 
ded transformations. 

Internal action. A cache moves from state <71 to state q2 (<?i yf (72): a^i = 
— 1, ^2 = X2 -I- 1 with the proviso that > 1 is part of G(x). 
Rendez-vous. Let us consider the case in which two caches synchronize on a 
signal. A cache G in state qi synchronizes with a cache G' in state (73, the 
state of G changes to q2, the state of G' changes to (74 (all states are different). 
We model this transition as a:^ = xi — l,X2 = 0:2 -I- 1, 3:3 = X3 — 1, x'4 = X4 + 1, 
with the proviso that > 1, X3 > 1 is part of G(x). 

Synchronization. All the caches in state qi,...,qm go to state qi, e.g., for 
i > m. We model this transition as = 0 , . . . , xl^ = 0, x' = Xi+X\ + . . .+Xm- 
This feature can be used, e.g., to model bus invalidation signals. 

A run of an EFSM f is a (possibly infinite) sequence of EFSM-states Ci, . . . , Ci . . . 
where Gr(cj) A Tj.(ci,Ci+i) = true for some transitions Gr — >■ T^- in £. We 
will denote the existence of a run from c to c' as c A- c' Finally, we define a 
predecessor operator pre : 7^(N”) ^ 7^(N") over sets of EFSM-states as follows. 

pre{S) = {c I c — >• c', c' G S}. 

Here — >■ indicates a one-step EFSM state transition. 
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(rl) dirty + shared + exclusive >!—>■. 

(r2) invalid > 1, dirty = 0, shared = 0, exclusive = 0 

invalid = invalid — 1, exclusive = exclusive + 1. 

(r3) invalid > 1, dirty >1 — > 

invalid = invalid — 1, dirty = dirty — 1, shared = shared + 2. 

(r4) invalid > 1, shared + exclusive >1 

invalid = invalid — 1, shared = shared + exclusive + 1, exclusive = 0. 

(r5) dirty > 1 — >■ 

(r6) exclusive >1 exclusive = exclusive — 1, dirty = dirty + 1. 

(r7) shared >1 ^ 

shared =0, invalid = invalid + shared — I, dirty = dirty + 1. 

(r8) invalid >1 — >■ invalid = invalid -\- exclusive + dirty -\- shared — 1, 

exclusive =0, shared =0, dirty =1. 

(r9) dirty >1 — >■ dirty = dirty — 1, invalid = invalid -\- 1. 

(rlG) shared >1 — >■ shared = shared — I, invalid = invalid +1. 

{rll) exclusive > I — >■ exclusive = exclusive — I, invalid = invalid +1. 

Fig. 2. EFSM for the Illinois Protocol: all variables range over positive integers. 



3.1 The EFSM for the Illinois Protocol 

Let invalid, dirty, shared, and exclusive be variables ranging over positive in- 
tegers. The EFSM for the Illinois protocol is shown in Fig. 2. For simplicity, we 
omit the location and all equalities of the form x[ = Xi. Furthermore, in a rule 
like ‘G — >■ ’ all variables remain unchanged. Rule rl of Fig. 2 represents read hit 
events: since no coherence action is needed, the only precondition is that there 
exists at least one cache in a valid state, i.e., dirty + shared + exclusive > 1. 
Rules r2 — r4 correspond to read miss events where the global predicate Tm is 
expressed via guards containing tests for zero. Specifically, rule r2 represents a 
read miss such that fi = false for some i, i.e., one cache can move to valid- 
exclusive. The case in which a cache copies its content from a dirty cache and 
the two caches move simultaneously to shared is defined via rule r3. Rule r4 ap- 
plies whenever the block is copied from a cache in shared or valid- exclusive state. 
Rules r5 — rl model write hits. Specifically, rule r5 models a write in state dirty 
(no action is taken). Rule r6 models a write in state valid- exclusive where the 
state changes to dirty without bus invalidation signal. Rule rl models a write in 
state shared where the copies in all other caches are invalidated. Note that, in 
rule rl we implicitly assume that whenever share > 1 then dirty = 0. We will 
(automatically) prove that this is an invariant of the protocol in Section 5. Rule 
r8 corresponds to a write miss: one cache moves to dirty and the copies in the 
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other caches are invalidated. Finally, rules r9 — rll model replacement events. 
If the cache is in one of the states dirty, shared or exclusive its state changes to 
invalid. 

4 Protocol Validation = EFSM Invariant Checking 

Let P be a protocol with global machine A4g and states Q = {<?i, ■ . ■ , <?«}• Given 
a global state G, we define #G as the tuple of natural numbers c = (ci, . . . , c„) 
where Ci=numher of caches in G with state = Qi. Now, let Sp be the EFSM in 
which a state c represents the set of global states {G | #G = c}, and whose 
transitions are obtained according to Section 3. The following proposition relates 
runs in Mg and runs in £p. 

Proposition 1 (Adequacy of the Encoding). Let Mg and £p be defined as 
above. Then, c ^ d in £p if and only if there exist two global states G and G' , 
such that #G = c, #G' = d and G ^ G' in Mg- 

The previous property allows us to reduce the parameterized reachability problem 
for coherence protocols to a reachability problem for the corresponding EFSMs. 
Our approach to attack the second problem is based on the following points. 

Symbolic State Representation. In order to represent concisely (possibly infinite) 
sets of global states independently from the number of caches in the system, we 
use linear arithmetic constraints=systems of linear inequalities as a symbolic re- 
presentation of sets of EFSM-states. This class of constraints is powerful enough 
to express initial and unsafe (target) sets of states for the verification problems 
we are interested in. For instance, the set of unsafe states of the Illinois pro- 
tocol where at least 1 cache is in state shared and at least 1 cache is in state 
dirty can be represented finitely as the constraint x shared > 1 A x dirty > 1- This 
is a crucial aspect in order to attack the parameterized reachability problem. 
In the rest of the paper we will use the lower-case letters . . . to denote 
constraints and the upper-case letter 'T,T>,. . . to denote sets (disjunctions) of 
constraints. Following [ACJT96], the denotation of a constraint tp is defined as 
I'FI = {t I t G N" satisfies p}. The definition is extended to sets in the natural 
way. Furthermore, we say that a constraint entails a constraint p, written 
p Q tjj, iS |'0] C |i^]. We define a symbolic predecessor operator sym_pre over 
sets of constraints such that |sym_pre(^)] = pre(|^]) {pre is defined in Section 
3). The operator is defined via the satisfiability test and variable elimination 
over N (the domain of interpretation of constraints) [BGP97, DP99]. Formally, 
for a given constraint p(x.') with variables over x', sym_pre is defined as follows 

sym_pre(v3(x')) = Uie/ { 3 x'. Gi(x) A T*(x,x') A V5(x') } 

where x and x' range over N, and G^(x) — >■ Ti(x,x.') is an EFSM transition rule 
for i £ I (/=index set). 
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Backward Reachability. We apply a variation of the backward reachability al- 
gorithm of [ACJT96], where all operations on sets of states are lifted to the 
constraint-level. The reason we adopt backward reachability is due to the re- 
sult proved in [EFM99]: in contrast to forward reachability, the algorithm of 
[ACJT96] always terminates whenever the input EFSM is a broadcast protocol 
[EN98] and the set of unsafe states is upward-closed. A set S' C of states is 
upward-closed whenever for all tuples t G S: if t' is greater equal than t w.r.t. the 
componentwise ordering of tuples, then t' G S. As an example, the denotation 
of the constraint x shared ^ 1 Xdi^.ty ^ 1 (the variables Xinvaiid and Xyaiid—ex 
are implicitly > 0) is an upward-closed set over As shown in [DEP99], the 
result of [EFM99] implies that the symbolic reachability algorithm using integer 
constraints to represent sets of states always terminates on inputs consisting of 
a broadcast protocol and of constraints that represents upward-closed sets of 
unsafe states. 

Relaxation of Constraint Operations. In order to reduce the complexity of the 
manipulation of arithmetic constraints, we follow techniques used, e.g., in integer 
programming and program analysis. Every time we need to solve a system of 
inequalities A • x < b we ‘relax’ the condition that x must be a vector of positive 
integers and look for a real solution of the corresponding linear problem. We 
apply the relaxation to the operations over constraints, i.e., we interpret the 
satisfiability test, variable elimination, and entailment test (needed to implement 
the symbolic backward reachability algorithm) over the domain of reals. The 
relaxation allows us to exploit efficient (polynomial) operations over the reals 
in contrast to potentially exponential operations over the integers. Note that 
this abstraction is applied only during the analysis and not at the semantic level 
of EFSMs. As we will discuss later, in many cases this method does not lose 
precision wrt. an analysis over the integers. Formally, given a constraint p, we 
define IvsJr as the set of real solutions {t G K+ | t satisfies (p }. The entailment 
relation over R+ is defined then as (p tp if and only if C |(/?]r. When we 
apply the above relaxation to the symbolic predecessor operator, we obtain the 
new operator sym_prejj defined as follows 

sym_preK(v3(x')) = Uie/ { 3 x'. G*(x) A Ti(x,x') A (p(x') }, 

where x and x' range now over positive real numbers, and Brx.F = G R+.F 
The symbolic reachability algorithm we obtained is shown in Fig. 3. Note that 
this is the algorithm for backward reachability implemented in existing sym- 
bolic model checkers for hybrid and concurrent systems like HyTech [HHW97] 
and DMC [DP 99]. Each step of the procedure Symb-Reach-over-M involves only 
polynomial time cost operations. In fact, sym_prejj can be implemented using 
satisfiability test over R (e.g. using the simplex method, ‘polynomial’ in practical 
examples) and using Fourier-Motzkin variable elimination (for a fixed number of 
variables, polynomial in the size of the input constraints). Furthermore, the ent- 
ailment test p Cr Ip can be tested in polynomial time. In fact, (p Cr (ppi A %p 2 ) 
holds if and only if </> A -^\pi and (p A ->ip 2 are not satisfiable. Thus, the ent- 
ailment test reduces to a linear (in the size of the input constraints) number of 
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Proc Symb-Reach-over-R(^o, : sets of constraints) 

set ^ and 'F := 0; 

while ^ 0 do 

choose If £ $ and set := <1>\ {(p}; 
if there are no i/) G ^ s.t. r/> Cr ip then 

set 'I' := <1' U {ip} and ^ ^ U sym_preg((p); 

if sat^{$o A '!') then return ia reachable from 
else return is not reachable from 

Fig. 3. Symbolic reachability. 



satisfiability tests. In contrast, the cost of executing the same operations over N 
may be exponential. For instance, in [DEP99] we have shown (via a reduction 
from the Hitting Set problem) that checking (p Q ip (over N) is already co-NP 
hard whenever tp and ip are two instances of the additive constraints of Remark 
1, Section 3 (i.e., constraints without equalities). 

As in other verification methods for infinite-state systems (e.g. for hybrid 
systems [HHW97], FIFO systems [BW98], and parameterized concurrent pro- 
grams [ABJN99]), the algorithm is not guaranteed to terminate on every input. 
This is the price we have to pay in order to model realistic examples. We give 
next sufficient conditions for the theoretical termination of Symb-Reach-over-R. 
We postpone the evaluation of its practical termination to Section 5. 

Sufficient Conditions for Termination. As for its companion algorithm defined 
over the domain of integers, the procedure Symb-Reach-over-R always termi- 
nates, returning exact results, if both the guards of the input EFSM and the 
unsafe states are expressed via the additive constraints of Remark 1, Section 3. 
This result proves that, when applied to broadcast protocols, the algorithm of 
[ACJT96] formulated over constraints is robust under the relaxation integer-reals 
of the constraint operations {robust=\t always terminates and solves the reacha- 
bility problem taken into consideration). Formally, we have the following result. 



Theorem 1. Given a broadcast protocol P, Symb-Reach-over-R{<Po,d^f) solves 
the reachability problem ‘ 3cq G |^o1)3ci G such that Cq A Ci whene- 

ver <pQ is a set of additive constraints (possibly extended with conjunctions of 
equalities of the form Xi = Ci, Cj G NJ, and <Pf is a set o/ additive constraints. 

Sketch of the proof. Following the methodology of [AJ99, FS98], we need to 
prove the following lemmas: (1) given an additive constraint p, we can effec- 
tively compute sym_prej 5 ((/?); (2) the class of additive constraints is closed un- 
der application of sym_prejj; and, finally, (3) the class of additive constraints 
equipped with the order Cr is a well- quasi ordering, i.e., there exist no infinite 
chains of additive constraints p\ . . .pi . . . such that pj ^r pi for all j < i. Point 
(1) follows from our definition of sym.preg, whereas point (2) and (3) are for- 
mally proved in [DelOO]. For instance, the proof of lemma (2) is based on the 
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following observation. Let be an additive constraint and r be a transition 

G(x) — >■ T(x,x'). Then, we can compute sym_prejj((p) (restricted to r) by re- 
placing all ‘primed variables’ in if with the right-hand side of the transformation 
T(x,x'), and by conjoining the resulting constraint with the guard G(x). The 
special form of T(x,x') (each variable occurs only once in the right-hand side of 
assignments) ensures that the resulting constraint is still an additive constraint. 
As a corollary of lemma (2), it follows that computing symbolically the prede- 
cessor of an additive constraint ip over R and over N gives the same results (both 
computations amounts to a replacement by equals). In other words, sym_prejj 
gives accurate results, i.e., |sym_pre]g(^)] = |sym_pre(<?)] =pre(|^]). (Note: 
|-] denotes integer solutions). □ 

To our knowledge, this result was not considered in previous works on well 
structured system [ACJT96, AJ99, FS98]. In the rest of the paper we will dis- 
cuss some preliminary experimental results. 



5 Experimental Results 

We have applied HyTech and DMC to automatically verify safety properties for 
the MESI, Berkeley RISC, Illinois, Xerox PARC Dragon and DEC Firefly pro- 
tocols [Han93]. The guards we need to model the Dragon and Firefly protocols 
are more complicated than those of the Illinois protocol. The results of the ana- 
lysis are shown in Fig. 4. For instance, in the Illinois protocol the parameterized 
initial configuration is expressed as <Po = invalid > 1, exclusive = 0, dirty = 
0, shared = 0. Similarly, we can represent the potentially unsafe states described 
in Section 2 as follows: <hi = invalid > 0, exclusive > 0, dirty > 1, shared > 1 
(property iNVi) and d >2 = invalid > 0, exclusive > 0, dirty > 2, shared > 0 
(property INV2). We automatically checked both properties using HyTech and 
DMC (without need of accelerations) as specified in Fig. 4. HyTech execution 
times are often better (HyTech is based on Halbwachs’ efficient polyhedra library 
[Hal93]). However, the HyTech built-in command reach backward we use for the 
analysis does not terminate in two cases (see table). DMC terminates on all ex- 
amples thanks to a set of built-in accelerations [DP99]. Similar techniques (e.g. 
extrapolation) are described in [HH95, LHR97] but they are not provided by the 
current version of HyTech. We have tried other experiments using HyTech: for- 
ward analysis using parameter variables (to represent the initial configurations) 
does not terminate in several examples; approximations based on the convex 
hull (using the built-in hull operator applied to intermediate collections of sta- 
tes) returned no interesting results. We have also experimented other type of 
abstractions. Specifically, we have analyzed the EFSMs obtained by weakening 
the guards of the original descriptions (e.g. turning tests for zero into inequali- 
ties) so as to obtain EFSMs for which our algorithm always terminates. As shown 
by the ‘question marks’ in the results for Abstract Illinois in Fig. 4, with this 
abstraction we find errors that are not present in the original protocol (in fact, 
the resulting reachable states are a super-set of those of the Illinois protocol) . 
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Protocol 


Unsafe-GS 


HyTech-ET^ 


HyTech-NS 


DMC-ET'' 


DMC-NS 


Safe? 


Mesi 


D>2 


0.77s 


3 


1.0s 


3 


yes 


D>1,S>1 


0.66s 


2 


0.6s 


2 


yes 


Berkeley RISC 


D>2 


0.52s 


1 


0.3s 


1 


yes 


D>1,S>1 


0.94s 


3 


1.5s 


3 


yes 


University of Illinois 


D>2 


1.06s 


3 


2.0s 


3 


yes 


D>1,S>1 


2.32s 


4 


10.3s 


4 


yes 


DEC Firefly 


D>2 


t 


- 


28.2s 


7 


yes 


D>1,S>1 


3.01s 


4 


11.4s 


4 


yes 


Xerox PARC Dragon 


D>2 


t 


- 


84.2s 


6 


yes 


D>l,Sc >1 


5.30s 


5 


25.1s 


5 


yes 


D>l,Sd >1 


5.26s 


5 


25s 


5 


yes 


Abstract Illinois 


D>2 


2.86s 


5 


16.9s 


5 


? 


D>1,S>1 


8.14s 


7 


96.3s 


7 


? 



^ on a Sun-SPARCstation-5 OS 5.6 ^ on a Pentium 133 Linux 2.0.32 

Fig. 4. ET=Execution Time; NS=No. Steps (t=diverges) . Unsafe-Global States: 
D=Dirty, S=Shared, Sc=Shared-Clean, Sd=Shared-Dirty. Abstracted Illinois is obtai- 
ned by weakening the guards of Illinois. 



6 Related Works 

Our approach is inspired by the recent work of Emerson and Namjoshi on Pa- 
rameterized Model Checking [EN96, EN98b], and broadcast protocols [EN98]. 
As discussed in the paper, broadcast protocols are not general enough to model 
the global conditions required by the protocol we have validated in this paper. 
The verification technique proposed in [EN98] is an extension of the coverabi- 
lity graph for Petri Nets (forward reachability). For broadcast protocols, this 
construction is not guaranteed to terminate [EFM99]. In contrast, backward 
reachability always terminates when the target set of states is upward-closed 
[EFM99]. In [DEP99], the author in collaboration with Esparza and Podelski 
proposes efficient data structures for representing integer constraints for the 
verification of broadcast protocols. There exist specialized symbolic state ex- 
ploration techniques for the analysis of parameterized coherence protocols. In 
[PD95], Pong and Dubois propose the symbolic state model (SSM) for the repre- 
sentation of the state-space and the verification of protocols. Specifically, they 
apply an abstraction and represent sets of global states via repetition operators 
to indicate 0,1, or multiple caches in a particular state. In our EFSM-model we 
keep track of the exact number of processes in each state. SSM verification me- 
thod is based on a forward exploration with ad hoc expansion and aggregation 
rules. In [ID99], Norris Ip and Dill have incorporated the repetition operators 
in Murtp. Mur(/? automatically checks the soundness of the abstraction based on 
the repetition operators, verifies the correctness of an abstract state graph of 
a fixed size using on-thc-fly state enumeration, and, finally, tries to generalize 
the results for systems with larger (unbounded) sizes. In contrast, our method is 
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based on general purpose techniques (backward reachability and constraints to 
represent sets of states) that have been successfully applied to the verification 
of timed, hybrid and concurrent systems (see e.g. [HHW97, BGP97, DP99]). 
Being specialized to coherence protocols, SSM can also detect stale write-backs 
and livelocks. The verification of this type of properties using our method is 
part of our future works. We are not aware of other applications of infinite-state 
symbolic model checkers based on arithmetical domains to verification of pa- 
rameterized cache coherence protocols. Several approaches exist to attack the 
verification problem of parameterized concurrent systems. In [GS92], German 
and Sistla define an automatic method for verification of parameterized asyn- 
chronous systems (where processes are model in GGS-style). However, methods 
that handle global conditions like ours (e.g. [ABJN99]) are often semi-algorithms, 
i.e., they do not guarantee the termination of the analysis. Other methods based 
on regular languages have been proposed in [ABJN99, GGJ97]. Among semi- 
automatic methods that require the construction of abstractions and invariants 
we mention [BGG89, GGJ97, HQR99, McM99]. Automated generation of in- 
variants has been studied e.g. in [GGJ97, LHR97]. Automated generation of 
abstract transition graphs for infinite-state systems has been studied in [GS97]. 
Symmetry reductions for parameterized systems have been considered, e.g., in 
[ID99, McM99, PD95]. 

Finally, in [DelOO] the author shows that the method presented in this pa- 
per (backward reachability, constraint-based representation, relaxation) can be 
used as a general methodology to verify properties of parameterized synchronous 
systems. 



7 Conclusions 

We have proposed a new method for the verification of coherence protocols for 
any number of processors in the system. We have applied our methods to suc- 
cessfully verify safety properties of several protocols taken from the literature 
[AB86, Han93, PD95]. This result is obtained using technology originally de- 
veloped for the verification of hybrid and concurrent systems, namely HyTech 
[HHW97] and DMG [DP99]. In our approach we propose the following abstrac- 
tions. We ‘count’ the number of caches in every possible protocol state, so that 
we get an integer system out of a parameterized protocol; we relax the constraint 
operations needed to implement the symbolic backard reachability algorithm for 
the resulting integer system. The abstraction based on the relaxation often gi- 
ves accurate results (e.g. when intermediate results are additive constraints) and 
allows us to prove all the properties of our examples we were interested in. As 
discussed in Section 5, with other types of approximation techniques we might 
abstract away crucial properties of the original protocols. As future works, we 
plan to extend our method to other classes of coherence protocols (e.g. directory- 
based [Han93]), and properties (e.g., livelocks), and to study techniques to provide 
error traces for properties whose verification fails. 
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Abstract. We introduce discrete pushdown timed automata that are 
timed automata with integer-valued clocks augmented with a pushdown 
stack. A configuration of a discrete pushdown timed automaton includes 
a control state, hnitely many clock values and a stack word. Using a 
pure automata-theoretic approach, we show that the binary reachability 
(i.e., the set of all pairs of configurations (a,l3), encoded as strings, such 
that a can reach /3 through 0 or more transitions) can be accepted by a 
nondeterministic pushdown machine augmented with reversal-bounded 
counters (NPCM). Since discrete timed automata with integer- valued 
clocks can be treated as discrete pushdown timed automata without the 
pushdown stack, we can show that the binary reachability of a discrete ti- 
med automaton can be accepted by a nondeterministic reversal-bounded 
multicounter machine. Thus, the binary reachability is Presburger. By 
using the known fact that the emptiness problem is decidable for reversal- 
bounded NPCMs, the results can be used to verify a number of properties 
that can not be expressed by timed temporal logics for discrete timed 
automata and CTL for pushdown systems. 



1 Introduction 

After the introduction of efficient automated verification techniques such as sym- 
bolic model-checking [16], finite state machines have been widely used for mode- 
ling reactive systems. Due to the limited expressiveness, however, they are not 
suitable for specifying most infinite state systems. Thus, searching for models to 
represent more general transition systems and analyzing the decidability of their 
verification problems such as reachability or model-checking is an important re- 
search issue. In this direction, several models have been investigated such as 
pushdown automata[4,12,17], timed automata[2] (and real-time logics [3, 1,14]), 
and various approximations on multicounter machines[9,7]. 
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A pushdown system is a finite state machine augmented by a pushdown stack. 
On the other hand, a timed automaton can be regarded as a finite state machine 
with a number of real- valued clocks. All the clocks progress synchronously with 
rate 1, and a clock can be reset to 0 at some transition. Each transition also 
comes with an enabling condition in the form of clock constraints (i.e., Boolean 
combinations of x^c and x — where x and y are clocks, c is an integer 
constant, and # denotes >, < or =. Such constraints are also called regions.). A 
standard region technique [2] (and more recent techniques[6,18]) can be used to 
analyze region reachability. 

In this paper, we consider integer- valued clocks. We call a timed automaton 
with integer-valued clocks a discrete timed automaton. A strictly more powerful 
system can be obtained by combining a pushdown system with a discrete timed 
automaton. That is, a discrete timed automaton is augmented with a pushdown 
stack (i.e., a discrete pushdown timed automaton). We give a characterization 
of binary reachability defined as the set of pairs of configurations (control 
state and clock values, plus the stack word if applicable) (a,/3) such that a can 
reach (3 through 0 or more transitions. Binary reachability characterization is 
a fundamental step towards developing a model checking algorithm for discrete 
pushdown timed automata. ^From classical automata theory, it is known that the 
binary reachability of pushdown automata is context-free. For timed automata 
(with either real- valued clocks or integer- valued clocks), the region technique 
is not enough to give a characterization of the binary reachability. Recently, 
Comon et. al. [10] showed that the binary reachability of timed automata (with 
real- valued clocks) is expressible in the additive theory of reals. They show that a 
timed automaton with real- valued clocks can be flattened into one without nested 
cycles. Their technique also works for discrete timed automata. However, it is not 
easy to deduce a characterization of the binary reachability of discrete pushdown 
timed automata by combining the above results. The reason is, as pointed out 
in their paper, this flattening destroys the structure of the original automaton. 
That is, the flattened timed automaton accepts different sequences of transitions, 
though the binary reachability is still the same. Thus, their approach cannot be 
used to show the binary reachability of the discrete pushdown timed automata 
proposed in this paper, since by flattening the sequence of stack operations 
cannot be maintained. A class of Pushdown Timed Systems (with continuous 
clocks) was discussed in [5]. However, that paper focuses on region reachability 
instead of binary reachability. 

In this paper, we develop a new automata-theoretic technique to characterize 
the binary reachability of a discrete pushdown timed automaton. Our technique 
does not use the region technique [2] nor the flattening technique [10]. Instead, 
a nondeterministic pushdown multicounter machine (NPCM), which is a non- 
deterministic pushdown automaton with counters, is used. Obviously, without 
restricting the counter behaviors, even the halting problem is undecidable, since 
machines with two counters already have an undecidable halting problem. An 
NPCM is reversal-bounded if the number of counter reversals (a counter chan- 
ging mode between nondecreasing and nonincreasing and vice-versa) is boun- 
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ded by some fixed number independent of the computation. We show that the 
binary reachability of a discrete pushdown timed automaton can be accepted by 
a reversal-bounded nondeterministic pushdown multicounter machine. We also 
discuss the safety analysis problem. That is, given a property P and an initial 
condition I, which are two sets of configurations of a discrete pushdown timed 
automaton A, determine whether, starting from a configuration in I, A can only 
reach configurations in P. Using the above characterization and the known fact 
that the emptiness problem for reversal-bounded NPCMs is decidable, we show 
that the safety analysis problem is decidable for discrete pushdown timed auto- 
mata, as long as both the safety property and the initial condition are accepted 
by nondeterministic reversal-bounded multicounter machines. 

It is known that Presburger relations can be accepted by reversal-bounded 
multicounter machines. Therefore, it is immediate that the safety analysis pro- 
blem is decidable as long as both the safety property and the initial condition 
are Presburger formulas on clocks. A discrete timed automaton can be treated 
as a discrete pushdown timed automaton without the pushdown stack. We can 
show that the binary reachability of a discrete timed automaton can be accepted 
by a reversal-bounded nondeterministic multicounter machine (i.e., a reversal- 
bounded NPCM without the pushdown stack). That is, the binary reachability 
of a discrete timed automaton is Presburger. This result shadows the result in 
[10] that the binary reachability of a timed automaton with real- valued clocks 
is expressible in the additive theory of reals, although our approach is totally 
different. 

The characterization of for discrete pushdown timed automata will lead 
us to formulate a model checking procedure for a carefully defined temporal lo- 
gic. The logic can be used to reason about a class of timed pushdown processes. 
Due to space limitation, we omit it here. In fact, the binary reachability cha- 
racterization itself already demonstrates a wide range of safety properties that 
can be verified for discrete pushdown timed automata. We will show this by 
investigating a number of examples of properties at the end of the paper. 



2 Discrete Pushdown Timed Automata 

A timed automaton [2] is a finite state machine augmented with a number of real- 
valued clocks. All the clocks progress synchronously with rate 1, except a clock 
can be reset to 0 at some transition. In this paper, we consider integer-valued 
clocks. A clock constraint is a Boolean combination of atomic clock constraints in 
the following form: x#c, a;— y#c where # denotes <,>,<,>, or =, c is an integer, 
X, y are integer- valued clocks. Let Cx be the set of all clock constraints on clocks 
X. Let Z be the set of integers with Z+ for nonnegative integers. Formally, a 
discrete timed automaton A is a tuple {S, X, E) where S' is a finite set of ( control) 
states. X is a finite set of clocks with values in . E C S x 2^ x Ex x S is a 
finite set of edges or transitions. Each edge (s. A, I, s') denotes a transition from 
state s to state s' with enabling condition I G Cx and a set of clock resets X C X. 
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Note that A may be empty. Also note that since each pair of states may have 
more than one edge between them, A is, in general, nondeterministic. 

The semantics is defined as follows, a G S x is called a configuration 

with ax being the value of clock x and aq being the state under this configura- 
tion. a — > Q,' denotes a one-step transition along an edge (s. A, I, s') in A 
satisfying 

— The state s is set to a new location s', i.e., aq = s,a'q = s'. 

— Each clock changes according to the edge given. If there are no clock resets 
on the edge, i.e., A = 0, then clocks progress by one time unit, i.e., for each 
X € X, a'x = ax + 1. If ^ ^ 0, then for each x G A, a'x = 0 while for each 

X ^ X, a'x — ax- 

— The enabling condition is satisfied, that is, l{a) is true. 

We simply write a ^ a' if a can reach a' by a one-step transition. A path 
ao - ■ ■ ak satisfies ai -G for each i. Also write a P if a reaches /? through 
a path. Given a set P of configurations of A, write the preimage Pre*{P) of P 
as the set of configurations that can reach a configuration in P, i.e., 

Pre*{P) =def {a : for some P G P, a P}. 

The following figure shows an example of a discrete timed automaton with 
two clocks Xi and X 2 - The following sequence of configurations is a path: (sq, Xi = 
0,X2 = 0), (si,xi =0,X2 = 0), (so,xi = 1,X2 = 1), (si,a;i = 1,X2 = 0). 

The above defined A is a little different from the standard (discrete) timed 
automaton given in [2]. In that model, each state is assigned with a clock con- 
straint called an invariant in which A can remain in the same control state with 
all the clocks synchronously progressing with rate 1 as long as the invariant is 
satisfied. It is easy to see that, when integer-valued clocks are considered, A’s 
remaining in a state can be replaced by a self-looping transition with the invari- 
ant as the enabling condition and without clock resets. Each execution of such a 
transition causes all the clocks to progress by one time unit. Another difference 
is that in a standard timed automaton a state transition takes no time, even 
when the transition has no clock resets. In order to translate a standard timed 
automaton to our definition, we introduce a dummy clock. Thus, for each state 
transition t in a standard timed automaton the translated transition t' is exactly 
the same except the dummy clock is reset in t' . Thus, doing this will ensure that 
all clock values remain the same when t has no clock resets. Thus, standard 




x\ — X 2 < 2y X 2 > 0 




0 



Fig. 1. An example discrete timed automaton 
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timed automata can be easily transformed into the ones defined above. Since 
the paper focuses on binary reachability, the w-language accepted by a timed 
automaton is irrelevant here. Thus, event labels in a standard timed automaton 
are not considered in this paper. 

Discrete timed automata can be further extended by allowing a pushdown 
stack. A discrete pushdown timed automaton ^ is a tuple (T, S, X, E) where T 
is the stack alphabet, and S', X, E are the same as in the definition of a discrete 
timed automaton except that each edge includes a stack operation. That is, each 
edge e is in the form of (s. A, ( 77 , s') where s,s' G S, X C X is the set of clock 

resets and I G Cx is the enabling condition. The stack operation is characterized 
by a pair with rj G E and rj' G E*. That is, replacing the top symbol of 

the stack 77 by a word 77 '. A configuration a G (Z+)l^l x S x E* with a^) G E* 
indicating the stack content, a /3 can be similarly defined assuming that the 
stack contents in a and j3 are consistent with the sequence of stack operations 
along the path. 

This paper focuses on the characterization of binary reachability for 
both discrete timed automata and discrete pushdown timed automata. Before 
we proceed to show the results, some further definitions are needed. 

A nondeterministic multicounter machine (NCM) is a nondeterministic ma- 
chine with a finite set of (control) states Q = {1,2, • • • , |(5|}, and a finite num- 
ber of counters xi, - ■ ■ ,Xk with integer counter values. Each counter can add 1, 
subtract 1, or stay unchanged. Those counter assignments are called standard 
assignments. M can also test whether a counter is equal to, greater than, or less 
than an integer constant. Those tests are called standard tests. 

An NCM can be augmented with a pushdown stack. A nondeterministic pus- 
hdown multicounter machine (NPCM) M is a nondeterministic machine with a 
finite set of (control) states Q = (1, 2, • • • , |(5|}, a pushdown stack with stack 
alphabet E, and a finite number of counters x\, - ■ ■ ,Xk with integer counter va- 
lues. Both assignments and tests in M are standard. In addition, M can pop the 
top symbol from the stack or push a word in E* on the top of the stack. It is 
well-known that counter machines with two counters have undecidable halting 
problem, and obviously the undecidability holds for machines augmented with 
a pushdown stack. Thus, we have to restrict the behaviors of the counters. One 
such restriction is to limit the number of reversals a counter can make. A counter 
is n-reversal-hounded if it changes mode between nondecreasing and nonincrea- 
sing at most n times. For instance, the following sequence of a counter values: 
0, 0, 1, 1, 2, 2, 3, 3,4, 4, 3, 2, 1, 1, 1, 1, • • • demonstrates only one counter reversal. A 
counter is reversal-bounded if it is n-reversal-hounded for some n. We note that 
a reversal-bounded M (i.e., each counter in M is reversal-bounded) does not 
necessarily limit the number of moves or the number of reachable configurations 
to be finite. 

Let (j, vi, ■ ■ ■ ,Vk,w) denote the configuration of M when it is in state j G Q, 
counter Xi has value Vi G Z for i = 1, 2, • • • , /c, and the string in the pushdown 
stack is w G E* with the rightmost symbol being the top of the stack. Each 
integer counter value Vi can be represented by a unary string 0 "* when Vi 
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positive (negative). Thus, a configuration ■ ■ ■ ,Vk,w) can be represented 

as a string by concatenating the unary representations of each j,v\, ■ ■ ■ ,Vk as 
well as the string w with a separator ^ ^ F. For instance, (l,2,-2,w) can be 
represented by Similarly, an integer tuple (ui, • • • , Vk) can also be 

represented by a string. Thus, in this way, a set of configurations and a set of 
integer tuples can be treated as sets of strings, i.e., a language. It is noticed that 
a configuration a of a discrete (pushdown) timed automaton A can be similarly 
encoded as a string [a]. 

Note that the above defined M does not have an input tape; in this case it is 
used as a system specification rather than a language recognizer, in which we are 
more interested in the behaviors that M generates. When a NPCM (or NCM) M 
is used as a language recognizer, we attach a separate one-way read-only input 
tape to the machine and assign a state in Q as the final state. M accepts an 
input iff it can reach the final state. When M is reversal-bounded, the emptiness 
problem, i.e., whether M accepts some input, is known to be decidable. 

Theorem 1. The emptiness problem for reversal-hounded nondeterministic 
pushdown multicounter machines with a one-way input tape is decidable [15]. 

It has been shown in [13] that the emptiness problem for reversal-bounded non- 
deterministic multicounter machines (NCMs) with one-way input is decidable 
in time for some constant c, where n is the size of the machine, k is the 
number of counters, and r is the reversal-bound on each counter. We believe 
that a similar bound could be obtained for the case of NPCMs. 

Actually, Theorem 1 can be strengthened for the case of NCMs: 

Theorem 2. A set ofn-tuples of integers is definable by a Presburger formula iff 
it can be accepted by a reversal-hounded nondeterministic multicounter machine 
[15]. 

A language is hounded if there exist finite words w\, - ■ ■ , w„ such that each 
element can be represented as w* ■ ■ ■ wf. A nondeterministic reversal-bounded 
multicounter machine can be made deterministic on bounded languages. 

Theorem 3. If a hounded language L is accepted by a nondeterministic reversal- 
hounded multicounter machine, then L can also he accepted by a deterministic 
reversal-hounded multicounter machine [15[. 

For an NPCM M, we can define the preimage Pre*{P) of a set of configu- 
rations P similarly to be the set of all predecessors of configurations in P, i.e., 
Pre*{P) = {t \ t can reach some configuration t' in P in 0 or more moves}. Re- 
cently, we have shown [11] that Pre*{P) can be accepted by a reversal-bounded 
NPCM assuming that M is reversal-bounded and P is accepted by a reversal- 
bounded nondeterministic multicounter machine. 

Theorem 4. Let M he a reversal-bounded nondeterministic pushdown multi- 
counter machine. Suppose a set of configurations P is accepted by a reversal- 
hounded nondeterministic multicounter machine. Then Pre*{P) with respect to 
M can he accepted by a reversal-bounded nondeterministic pushdown multicoun- 
ter machine. 
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3 Main Results 

Let ^ be a discrete pushdown timed automaton with clocks Xi, - ■ ■ ,Xk- The 
binary reachability can be treated as a language {[a]# [/?]’’ : a fi} 
where [a] is the string encoding of configuration a, [/3]'’ is the reverse string 
encoding of configuration /?. ^ The two encodings are separated by a delimiter 
. The main result claims that the binary reachability can be accepted 
by a reversal-bounded NPCM using standard tests and assignments. A itself 
can be regarded as an NPCM, when we refer to a clock as a counter. However, 
tests in A as clock constraints are not standard tests. Furthermore, A is not 
reversal-bounded since clocks can be reset for an unbounded number of times. 

The proof of the main result proceeds as follows. We first show that 
can be accepted by a reversal-bounded NPCM using nonstandard tests and 
assignments. Then, we show that these nonstandard tests can be made stan- 
dard. Finally, these nonstandard assignments can be simulated by standard ones. 
Throughout the two simulations the counters remain reversal-bounded. 

First, we show that clocks xi, - ■ ■ ,Xk in A can be translated into reversal- 
bounded ones. Let yo^yi, • ■ ■ ,yk be another set of clocks such that Xi = yo — yt 
{1 < i < k). Let A' be a discrete pushdown timed automaton that is exactly the 
same as A, except 

— A' has clock yo that never resets. Intuitively, the now-clock yo denotes current 
time. 

— Each yi with 1 < i < A: denotes the (last) time when a reset of clock Xi 
happens. Thus, each reset of clock Xi on an edge is replaced by updating yi 
to the current time, i.e., yt := yo- If Xi does not reset on an edge, the value 
of yi is unchanged. Also, only if there is no clock reset on an edge, add an 
assignment j/o := 2 /o + 1 to the edge to indicate that the now-clock progresses 
with one time unit. Only these assignments can change yo- 

— the enabling condition on each edge of A is replaced by substituting Xi 
with yo — yi- Note that the enabling conditions Xi^c and Xi — xj^c become 
yo — yiifc and yj—yiifc, respectively. Thus, the resulting enabling conditions 
are Boolean combinations of yi — yjifc with 0 < i, j k and c being an 
integer constant. 

Counters yo,yi, ■ ■ ■ ,yk in do not reverse. The reason is that assignments 
that change the counter values are only in the form of: yo ■= yo ^ and yi := yo 
for 1 < i < k, and there is no way that a counter yi decreases. For a configuration 
aoi A and m G Z+, write a“ to be a configuration of A' such that = u (j/o’s 
value is u), and for each 1 < i < k, o;“. = u—a^i {yi is the translation of Xi), and 
cAj, = a-u) (the stack content is the same). Also write max a to be the maximal 
value of clocks axi in a (note that, each ax^ is nonnegative by definition). Thus, 
a^axcc jg configuration a“ of A! with yo's value being max a- It follows 
directly, by induction on the length of a path, that the binary reachability of A 
can be characterized by that of A' as follows. 

^ The reason that we use the reverse encoding of (3 will become apparent in the Proof 
of Theorem 6. 
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Theorem 5. For any pair of configurations a and (3 of a discrete pushdown 
timed automaton A, the following holds, 

a fd iff there exist w € Z+ with v > max a such that 

^From the above theorem, it suffices for us to investigate the binary reachabi- 
lity of A'. As mentioned above, A' is an NPCM with reversal-bounded counters 
yo,yi, - ■ ■ ,Vk- However, instead of standard tests, A' has tests that check an 
enabling condition by comparing the difference of two counters against an inte- 
ger constant. Also the assignments include only yo '■= yo + 1 and yi := yo for 
1 < i < fc in A' , which are not standard assignments. The following theorem 
says the nonstandard tests can be made standard. 

Theorem 6. The binary reachability of A' can be accepted by a reversal- 
bounded NPCM using standard tests and nonstandard assignments that are of 
the form yo := t/o + 1 o,nd yi := yo with 1 < i < k. 

Proof. We construct the reversal-bounded NPCM as required. Given a pair of 
string encodings of configurations and (3^ (separated by a delimiter 
not in the stack alphabet, also recall that the encoding of (3^ has the stack word 
in (3^ reversed.) of A' on M’s one-way input tape, M first copies cr^ into its 
k-\-l counters yo,yi, ■ ■ ■ ,yk and the stack. Thus, M’s input head stops at the be- 
ginning of (3-^ . M starts simulating A' as follows with the stack operations in A' 
being exactly simulated on its own stack. Tests in A' are Boolean combinations 
of yi — yjffc for 0 < i,j < k. Using only standard tests, M cannot directly com- 
pare the difference of two counter values against an integer c by storing yi — yj 
in another counter, since each time this “storing” is done it will cause at least a 
counter reversal, and we don’t have a bound on the number of such tests. In the 
following, we provide a technique to avoid such nonstandard tests. Assume m 
is one plus the maximal absolute value of all the integer constants that appear 
in the tests in A'. Denote the finite set [m] =def {— w, • • • , 0, • • • , m}. M uses 
its finite control to build a finite table. For each pair of counters yi and yj with 
0 < J 3fk, there is a pair of entries and bij. Each entry can be regarded as 
finite state control variable with states in [to]. Intuitively, is used to record 
the difference between the values of two counters yi and yj . bij is used to record 
the “future” value of the difference when a clock assignment yi := j/o occurs 
in the future. During the computation of A! , when the difference goes beyond 
TO or below —to, stays the same as to or —to. M uses Uijffc to do a test 
yi — yjffc. Doing this is always valid, as we will show later. Thus, M only uses 
standard tests. Below, “ADD 1” means adding one if the result does not exceed 
TO, otherwise it keeps the same value. “SUBTRACT 1” means subtracting one 
if the result is not less than —to, otherwise it keeps the same value. In the fol- 
lowing, we show how to construct the table. When assignment yo := yg 1 is 
being executed by A', M updates the table as follows, for each 0 < i,j < k: 

— Oij stays the same if t > 0 and j > 0. That is, the now-clock’s progressing 

does not affect the difference between two non-now-clocks. 
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— aij ADD 1 if f = 0 and j > 0, noticing that yi is the now-clock and yj is a 
non-now-clock (thus it remains unchanged), 

— aij SUBTRACT 1 if t > 0 and j = 0, noticing that yj is the now-clock and 
yt is a non-now-clock (thus it remains unchanged), 

— aij is always 0 if t = 0 and j = 0. The difference between two identical 
now-clocks is always 0. 

After updating all a^, entries bij are updated as below, for each 0 < < k, 

— bij := aoj- Thus bij is the value of yi — yj assuming currently there is a jump 
Vi '■= Vo- 
lt is noticed that an edge in A' cannot contain two forms of assignment, i.e., 
both 2/0 := 2/0 + 1 and 2 /i := 2 /o- Let t C { 2 / 1 , • • • , i/fc} denote assignments 2/i := 2/o 
for / G r on an edge being executed by M updates the table as follows, for 
each 0 < i,j < k: 

— aij := 0 Hi, j G r, noticing that both yi and yj are currently the same value 
as the now-clock 2 / 0 , 

— aij := bij if i G T and j ^ r, noticing that 2/i currently is the same value of 
the now-clock 2/0 and the difference 2/i — 2/j is prestored as bij, 

— aij := -bji if i ^ T and j G r, noticing that 2/i - 2/j = -{Vj ~ 2/i)> 

— aij stays the same Hi ^ t and j ^ r, since clocks outside t are not changed. 

After updating all a^, entries bij are updated as follows, for each 0 < i, j < k: 

— bij := 0 if i,j G r, noticing that both yi and yj are currently the same value 
as the now-clock 2/0 > 

— bij := ooj if i G T and j ^ t, noticing that yi currently is the same value as 
the now-clock 2 / 0 , 

— bij := —aoi H i ^ T and j G r, noticing that yj currently is set to the 
now-clock 2 / 0 , 

~ bij stays the same if / ^ r and j ^ r, noticing that bij represents 2/0 ~ Vj and 
in fact the two clocks 2/0 aud yj are unchanged after the transition. 

The initial values of a^- and bij can be constructed directly from as follows, 
for each 0 < i,j < k: 

— aij := - a:^. if \a^. “ I < 

— aij := m if — a^. > m, 

— aij := —m if a:^. — a^. < —m, 

and for each 0 < i, j < k: bij := aoj. 

M then simulates A' exactly except using for a test yi — yj41^c in A' , 

with —m < c < m. Then, we claim that doing this by M is valid. 

Claim. Each time after M updates the table by executing a transition, yi — yj^c 
iff aij^fc, and 2/0 ~ 2 /j#c iff for all 0 < i,j < k and for each integer 

c G [m — 1] . 
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Proof of the Claim. We prove it by induction. Obviously, the Claim holds for 
the initial values of all clocks yt (in configuration a'^ ) and the corresponding 
entries atj and bij, by the choice of m. Suppose that A' is currently at confi- 
guration 7 and the Claim holds. Thus, for all 0 < t, j < k and for each integer 
c G [m - 1 ], - 7y^-#c iff ajj#c and iff bij#c hold. Therefore, 7 

satisfies an enabling condition in A' iff the entries 07- satisfy the same enabling 
condition by replacing yi — yj with atj, noticing that m is chosen such that it is 
greater than the absolute value of any constant in all the enabling conditions in 
A'. Assume 7 satisfies the enabling condition on an edge e and A' will execute 
e next. Thus, M, using the entries to test the enabling condition, will also 
execute the same edge. We use 7' to denote the configuration after executing the 
edge, and use a'j and b[j to denote the table entries after executing the edge. 
We need to show, for all 0 < t, j < fc and for each integer c G [m — 1 ], 

(*) -7y,- #c iff aC#c 

and 

(**) %o- 7 ;,#ciff&',#c 

hold. There are two cases to be considered according to the form of the assign- 
ment. Suppose the assignment on e is a clock progress yo := 2/o + 1 - After this 
assignment, = 7yg -I- 1 and 7^,^ = 7^. for each 1 <i <k. On the other hand, 
according to the updating algorithm above, are updated for each 0 < i, j ^ k 
as follows, depending on the case. There are four subcases: 

— If i > 0 and j > 0 , then jy. = 7^., ^y. = "fy., aO = a^. The claim (*) holds 
trivially. 

~ If z = 0 and j > 0 , then 7^,. = 7^. -|- I, "f'y. = jy., aO = ADD I. Since yo 
is the only now-clock, all 7^^ — 7^^ , 7^ . — 7(,^ , o 7 and a^- are nonnegative. It 
suffices to show for any c > 0 , c G [m — 1 ], the claim holds. In fact, 7(,. — 7yj#c 
iff 7y, - 7yj#c - 1 iff aij#c - I iff -I- l#c. Also, a^- -I- l#c iff aO#c, by 
separating the cases for = m and Uij < m, and noticing that c < m. 
Thus, (*) holds, i.e., 7^. - 7%#c iff 

— If i > 0 and j = 0 , similar as above. 

— If i = 0 and j = 0 , the Claim (*) holds trivially. 

Noticing that under the assignment yo ■= yo + 1 ? b'^j := a'oj. Thus, (**) can be 
shown using (*). 

When the assignment is in the form oi yi := yo for yt G r C {yi, ■ ■ ■ ,yk}, 
(note that in this case, the now clock does not progress, i.e., 7^^ = jyg) there 
are four cases to consider in order to show (*) for all 0 < i, j ^ k, 

— If i,j G T, then j'y. = iy. = 7^^, a'^- = 0 and therefore iy. - iy. = aC = 0 . 
Thus, the Claim (*) trivially holds. 

— If z G r and j ^ t, then, iy. = 7^^, iy. = 7^^., aC = Thus, for each 

c G [m — 1 ], iy^ — iyic iff 7yp — 7j/j#c iff (induction hypothesis) bijic iff 

aC#c. The Claim (*j holds. 

— If z ^ r and j G t, similar as above. 

— If z ^ T and j ^ T, then, iy. = 7^ . , iy^ = ^y . , and aC = Uij . Thus, the Claim 
(*) holds trivially. 
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Now we prove Claim (**) under this assignment for r. Again, there are four 
cases to consider: 

- If i,j e T, then 6 C = 0, noticing that 'jy. = 'jyg and 'jy^ = Claim (**) 
holds. 

“ If t G T and j ^ T, then, 6 C = a^j. Claim (**) holds directly from Claim (*). 

- If f ^ T and j G T, similar as above. 

- If z ^ r and j ^ r, then, &C = In fact, 7 ^^ = jyg, Yy. = 7 ^^.. Thus, Claim 
(**) holds directly from the induction hypothesis. 

This ends the proof of the Claim. Thus, it is valid for M to use to 

do each test yi — yjYc. At some point, M guesses that it has reached the con- 
figuration by comparing the counter values and the stack content with 
through reading the rest of the input tape. M accepts iff such a comparison suc- 
ceeds. Clearly M accepts . There is a slight problem when M compares its 
own stack content with the one on the one-way input tape in by popping 
the stack. The reason is that popping the stack contents reads the reverse of the 
stack content. However, recall that the encoding of the stack word on the 
input tape is reversed. Thus, such a comparison can be proceeded. □ 

Assignments in M constructed in the above proof, in the form of, t/o •= 
j/o + 1 and yi := yo with 1 < z < fc, are still not standard. We will now show that 
these assignments can be made standard, while the machine is still reversal- 
bounded. Let M' be an NPCM that is exactly the same as M. M' simulates 
M’s computation from the configuration a~^ . Initially, each yi := as we 
indicated in the above proof. However, each time that M executes an assignment 
Do ■= yo + 1) M' increases all the counters by 1, i.e., yi := yi + I for each 

0 < i < k. When M executes an assignment yi := y^, M' does nothing. The 
stack operations in M are faithfully simulated by M' on its own stack. For each 

1 < z < fc, at some point, either initially or at the moment yi := yo is being 
executed by M, M' guesses (only once for each z) that yi has already reached 
the value given in P^ . After such a guess for z, an execution of yo := z/o + 1 will 
not cause yi := yi + 1 as indicated above (i.e., yi will no longer be incremented). 
However, after such a guess for z, a later execution of z/z := yo in M will cause 
M' to abort abnormally (without accepting the input). At some point after all 
1 < z < fc have been guessed, M' guesses that it has reached the configuration 
P^ . Then, M' compares its current configuration with the one on the rest of the 
input tape P^ (recall that the stack word in P^ is reversed on the input tape.). 
M' accepts iff such a comparison succeeds. Clearly, M' uses only assignments 
yo := J/o + 1 and z/^ := z/^ -|- 1 for 1 < z < /c. Thus, M' is also reversal-bounded 
and accepts . Therefore, 

Theorem 7. The binary reachability of A' can be accepted by a reversal- 
bounded NPCM using standard tests and assignments. 

Combining the above theorem with Theorem 5 and noticing that v in Theo- 
rem 5 can be guessed, it follows immediately that. 




80 



Z. Dang et al. 



Theorem 8. The binary reachability of a discrete pushdown timed automaton 
can be accepted by a reversal-bounded NPCM using standard tests and assign- 
ments. 

A discrete timed automaton is a special case of a discrete pushdown timed 
automaton without the pushdown stack. The above proofs still work for discrete 
timed automata without considering stack operations. That is, 

Theorem 9. The binary reachability of a discrete timed automaton can be ac- 
cepted by a reversal-bounded multicounter machine using standard tests and as- 
signments. 

Combining the above theorem and Theorem 2, it is immediate that the binary 
reachability of a discrete timed automaton is Presburger over clocks. This result 
shadows the result in [10] that the binary reachability of a timed automaton 
with real-valued clocks is expressible in the additive theory of reals. However, 
our proof is totally different from the flattening technique in [10]. 

4 Verification Results 

The importance of the characterization of for a discrete pushdown timed 
automaton A is that the emptiness of reversal-bounded NPCMs is decidable 
from Theorem 1. In this section, we will formulate a number of properties that 
can be verified for discrete pushdown timed automata. 

We first need some notation. We use a, /3 • • • to denote variables ranging over 
configurations. We use q, x,w to denote variables ranging over control states, 
clock values and stack words respectively. Note that a^;., aq and are still used 
to denote the value of clock Xi, the control state and the stack word of a. We 
use a count variable #a(w) to denote the number of occurrences of a character 
a G T in a stack word variable w. An NPCM-term t is defined as follows:^ 

t ::= n I q I X I #o(a™) \ \ aq \ t - t \ t -\- t 

where n is an integer and a G T. An NPCM-formula / is defined as follows: 

/ ::= t>0 \ t mod n = 0 | -■/ | / V / 

where n yf 0 is an integer^. Thus, / is a Presburger formula over control state 
variables, clock value variables and count variables. Let f be a formula in the 
following format: 

V(/*A V 

^ Control states can be interpreted over a bounded range of integers. In this way, an 
arithmetic operation on control states is well-defined. 

® We use (i,j, w) to denote the i-th character of w is the j-th symbol in F. Then, 
Theorem 10 is still correct when we include as an atomic formula in an 

NPCM-formula with i,j G Z^. 
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where fi, a* and j3‘^ are a number of NPCM-formulas and configuration variables. 
Write 3F to be a closed formula such that each free variable in F is existentially 
quantified. Then, the property 3F can be verified. ^ 

Theorem 10. The truth value of 3F with respect to a discrete pushdown timed 
automaton A for any NPCM-formula F is decidable. 

Proof. Consider each disjunctive subformula fi A a* /3* in F. Since fi is 
Presburger, (the domain of) fi can be accepted by a NCM Mf^ from Theorem 
2, and further from Theorem 3, since the domain of fi can be encoded as a 
set of integer tuples (thus, bounded), Mf^ can be made deterministic. ^From 
Theorem 8, we can construct a reversal bounded NPCM accepting (the domain 
of) a* /3b It is obvious that fi A a* /3* can be accepted by a reversal- 
bounded NPCM Mi by “intersecting” the two machines. Now, F is a union 
of Mf^. Since reversal-bounded NPCMs are closed under union [15], we can 
construct a reversal-bounded M to accept F. Since 3F = false is equivalent to 
testing the emptiness of M, the theorem follows from Theorem 1. □ 

Although 3F cannot be used to specify liveness properties, it can be used to 
specify many interesting safety properties. For instance, the following property: 
“for any configurations a and /3 with a [3, clock X 2 in /3 is the sum of 
clocks xi and X 2 in a, and symbol a appears in [3 twice as many times as symbol 
b does in a.” 

This property can be expressed as, VaV/3(a (3 -A {Px 2 = otxi + ctx 2 A 
#a{l3w) = 2#h(o;„))). The negation of this property is equivalent to 3F for 
some F. Thus, it can be verified. We also need to point out that 

— Even without clocks, #a(/3m) = indicate a nonregular set of stack 

word pairs. Thus, this property cannot be verified by the model checking 
procedures for pushdown systems [4,12,17], 

— Even without the pushdown stack, j3x2 = oixi + cnx^ is not a clock region 
[2]. Thus, the classical region techniques (include [5] for Pushdown Timed 
Systems) can not verify this property. This is also pointed out in [10]. 

Note that in an NPCM-formula, the use of a stack word is limited to count 
the occurrences of a symbol, e.g., ffa{otw)- In fact we can have the following more 
general use. Given P and I, two sets of configurations of a discrete pushdown 
timed automaton A. If, starting from a configuration in I, A can only reach con- 
figurations in P, then P is a safety property with respect to the initial condition 
/. The safety analysis problem is whether P is a safety property with respect 
to the initial condition I, given P and I. The following theorem asserts that 
the safety analysis problem is decidable for discrete pushdown timed automata, 

^ If stack words in a* and /3* are bounded, i.e., in the form of Wi • • • Wj. with wi, ■ ■ ■ ,Wk 
fixed, then F can be further extended to allow disjunctions and conjunctions over 
{fi A a" /3*) formulas and the following theorem still holds. The reason is that 
reversal-bounded NPCMs are closed under conjunction when languages are bounded 
[15]. 
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when both P and / are bounded languages and are accepted by reversal-bounded 
NCMs. 

Theorem 11. The safety analysis problem is decidable for discrete pushdown 
timed automata where both the safety property and the initial condition are boun- 
ded languages and accepted by reversal-bounded nondeterministic multicounter 
machines. 

Proof. Let .4 be a discrete pushdown timed automaton. Let P and I be accepted 
by reversal-bounded NCMs Mp and Mj, respectively. Note that P is a safety 
property with respect to the initial condition J iff / fl Pre*{~'P) = 0. That is, 
if P is a safety property, then, starting from a configuration, A can not reach 
a configuration that is in the complement -■P of P. ^From Theorem 3, since 
P and I are bounded, both Mp and Mj can be made deterministic. Thus, -•P 
can also be accepted by a reversal-bounded NCM. Therefore, from Theorem 8 
and Theorem 4, we can construct a reversal-bounded NPCM Mpre accepting 
Pre*(-'P). It is obvious that I fl Pre*{~'P) can also be accepted by a reversal- 
bounded NPCM M' by “intersecting” Mj and Mpre • The theorem follows by 
noticing that / fl Pre*{~^P) = 0, i.e., testing the emptiness of M' is decidable 
from Theorem 1. □ 

Thus, from the above theorem, the following property can be verified: 

“starting from a configuration a with the stack word a^b'^"' for some n, A 
can only reach a configuration f3 satisfying: the clock x\ in f3 is the sum of clock 
X 2 and X 3 in /3, and the stack word is for some n and m.” 

The reason is that a^b^"^ and encoded as Presburger tuples (thus 

bounded). Therefore they can be accepted by reversal-bounded NCMs. 

Now let’s look at A without a pushdown stack, i.e., a discrete timed auto- 
maton. Obviously, the above two theorems still hold for such A. However, since 
now is Presburger from Theorem 9, we can do more. An NCM-term t is 
defined as follows: 



t ::= n I q I X I \aq\t — t\t-\-t 

where n is an integer and a G P. An NCM-formula / is defined as follows: 

/ ::= t>0 I -/ I /V/ I I Va(/) | Vx(/) | Vq(/). 

Thus, / is a Presburger formula over control state variables, clock value variables 
and configuration variables. Thus, if / is closed (i.e., without free variables), 
then the truth value of / is decidable since / is Presburger. Thus, a property 
formulated as a closed NCM-formula can be verified. Even clocks are real values, 
this is still true from [10]. The following is an example property which can be 
verified: 

“for all configuration a there exists a configuration fd such that a fd and 
the clock xi in jd is the sum of clocks Xi and X 2 in a.” 

This property can be written in the following closed NCM-formula, 

Va3/3(a (d A d- a^^). 
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NCM- formulas can be extended by considering event labels as follows. Con- 
sider a discrete timed automata A. Recall that an edge in A does not have a 
label. Now we assume that, just as a standard timed automaton, each edge in 
A is labeled by a letter in a finite alphabet F. Denote R(a, f3, nr) to be a pre- 
dicate meaning a can reach f3 through a path that for each a G F, the number 
of occurrence of label a in the path is equal to with nr being an array of 
n^j for a G F. By introducing a new counter for each a G F and increasing 
the counter whenever M executes a transition labeled by a, we can construct a 
reversal-bounded NCM M as in the proof of Theorem 9 (and the actual proof is 
in Theorem 8.). From Theorem 2, we have, 

Theorem 12. R is Presburger. 

Thus, we can add atomic terms n^ for a G F and an atomic formula 
R{a, f3,np) to the above definition of NCM-formulas. Then such closed NCM- 
formulas can be verified for discrete timed automata with labels. The following 
is an example property: 

“For any configuration a there exists a configuration jS such that if the clock 
xi in /3 is the sum of clocks xi and X 2 in a, then a can reach j3 through a path 
with the number of transitions labeled by a being twice the number of transitions 
labeled by bP 

This property can be written in a closed NCM-formula, 

ya3f3{(3xi = -G 3na3nb{R{a, (3, ria, rih) A n„ = 2rih)). 

Thus, it can be verified. 

5 Conclusion 

We consider discrete pushdown timed automata that are timed automata with 
integer-valued clocks augmented with a pushdown stack. Using a pure automata- 
theoretic approach, we show that the binary reachability can be accepted by a 
reversal-bounded nondeterministic pushdown multicounter machine. The proof 
reveals that, by replacing enabling conditions with a finite table, the control part 
(testing clock constraints) and the clock behaviors (clock progresses and resets) 
can be separated for a discrete (pushdown) timed automaton, while maintaining 
the structure of the automaton. By using the known fact that the emptiness 
problem for reversal-bounded nondeterministic pushdown multicounter machines 
is decidable, we show a number of properties that can be verified. 

Binary reachability characterization is a fundamental step towards a more 
general model-checking procedure for discrete pushdown timed automata. It is 
immediate to see that the region reachability in [2] can be similarly formulated 
by using Theorem 11, as long as stack words are regular. Thus, we can use the 
idea in [4] as well as in [12] to demonstrate a subset of /r-calculus that has a 
decidable decision procedure for a class of timed pushdown processes. We plan 
to investigate this issue in future work. In the future we would also like to 
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investigate the complexity of the verification procedures we have developed in 
this paper. The techniques in this paper will be used in the implementation of 
a symbolic model checker for real-time specifications written in the specification 
language ASTRAL [8]. 

Thanks to anonymous reviewers for a number of useful suggestions. 
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Abstract. We consider a variant of the Boolean satisfiability problem where a sub- 
set £ of the propositional variables appearing in formula encode a symmetric, 
transitive, binary relation over N elements. Each of these relational variables, , 

for 1 < i < y < A, expresses whether or not the relation holds between elements 
i and j. The task is to either find a satisfying assignment to ^sat that also satisfies 
all transitivity constraints over the relational variables (e.g., ei,2 A 62,3 61,3), 

or to prove that no such assignment exists. Solving this satisfiability problem is the 
final and most difficult step in our decision procedure for a logic of equality with 
uninterpreted functions. This procedure forms the core of our tool for verifying 
pipelined microprocessors. 

To use a conventional Boolean satisfiability checker, we augment the set of clauses 
expressing with clauses expressing the transitivity constraints. We consider 
methods to reduce the number of such clauses based on the sparse structure of the 
relational variables. 

To use Ordered Binary Decision Diagrams (OBDDs), we show that for some sets 
£, the OBDD representation of the transitivity constraints has exponential size 
for all possible variable orderings. By considering only those relational variables 
that occur in the OBDD representation of Eg^t’ experiments show that we can 
readily construct an OBDD representation of the relevant transitivity constraints 
and thus solve the constrained satisfiability problem. 



1 Introduction 

Consider the following variant of the Boolean satisfiability problem. We are given a 
Boolean formula over a set of variables V. A subset £ CV symbolically encodes a 
binary, symmetric, transitive relation over N elements. Each of these relational variables, 
where 1 < i < j < fV, expresses whether or not the relation holds between elements 
i and j. Typically, £ will be “sparse,” containing much fewer than the N{N — l)/2 
possible variables. Note that when Cij ^ £ for some value of i and of j, this does not 
imply that the relation does not hold between elements i and j. It simply indicates that 
Fgat does not directly depend on the relation between elements i and j. 

A transitivity constraint is a formula of the form 

e[ii,u] e[i2,i3] A • • • A ^ (1) 

* This research was supported by the Semiconductor Research Corporation, Contract 99 -DC -684 
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where equals Cij when i < j and equals ej^i when i > j. Let Trans (6) denote the 
set of all transitivity constraints that can be formed from the relational variables. Our task 
is to find an assignment X- V — >■ {0, 1} that satisfies Fsat> every constraint in 

Trans {6). Goel, et al. [GSZAS98] have shown this problem is NP-hard, even when fgat 
is given as an Ordered Binary Decision Diagram (OBDD) [Bry86]. Normally, Boolean 
satisfiability is trivial given an OBDD representation of a formula. 

We are motivated to solve this problem as part of a tool for verifying pipelined 
microprocessors [VB99]. Our tool abstracts the operation of the datapath as a set of un- 
interpreted functions and uninterpreted predicates operating on symbolic data. We prove 
that a pipelined processor has behavior matching that of an unpipelined reference model 
using the symbolic flushing technique developed by Burch and Dill [BD94]. The major 
computational task is to decide the validity of a formula Tver in a logic of equality with 
uninterpreted functions [BGV99a,BGV99b]. Our decision procedure transforms Tver 
first by replacing all function application terms with terms over a set of domain variables 
{ui|l < i < N}. Similarly, all predicate applications are replaced by formulas over a 
set of newly-generated propositional variables. The result is a formula fyer containing 
equations of the form Vi = vj, where 1 < i < j < iV. Each of these equations is then 
encoded by introducing a relational variable Cij , similar to the method proposed by Goel 
et al. [GSZAS98]. The result of the translation is a propositional formula encf 
expressing the verification condition over both the relational variables and the proposi- 
tional variables appearing in fyer- Let denote -ienc/(Fygj-), the complement of 
the formula expressing the translated verification condition. To capture the transitivity 
of equality, e.g., that Vi = Vj A vj = Vk ^ Vi = Vk, we have transitivity constraints of 
the form eji jj A C[j^k] e[j fej. Finding a satisfying assignment to that also satis- 
fies the transitivity constraints will give us a counterexample to the original verification 
condition Fyer- On the other hand, if we can prove that there are no such assignments, 
then we have proved that Fyer is universally valid. 

We consider three methods to generate a Boolean formula that encodes the 

transitivity constraints. The direct method enumerates the set of chord-free cycles in 
the undirected graph having an edge (i, j) for each relational variable Cij G S. This 
method avoids introducing additional relational variables but can lead to a formula of 
exponential size. The dense method uses relational variables j for all possible values 
of i and j such that \ < i < j < N . We can then axiomatize transitivity by forming 
constraints of the form eji A C[j => ^[i,k\ for all distinct values of i, j, and k. This 
will yield a formula that is cubic in N. The sparse method augments S with additional 
relational variables to form a set of variables S~^, such that the resulting graph is chordal 
[Rose70]. We then only require transitivity constraints of the form A ey => ^[i,k\ 
such that C[ij ] , ^[j,k] ■, ^[i,k] G ■ The sparse method is guaranteed to generate a smaller 
formula than the dense method. 

To use a conventional Boolean Satisfiability (SAT) procedure to solve our constrained 
satisfiability problem, we run the checker over a set of clauses encoding both Fg^f and 
Ftrans- The latest version of the fgrasp SAT checker [M99] was able to complete 
all of our benchmarks, although the run times increase significantly when transitivity 
constraints are enforced. 
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When using Ordered Binary Decision Diagrams to evaluate satisfiability, we could 
generate OBDD representations of J^^at ^trans *^he apply algorithm to 

compute an OBDD representation of their conjunction. From this OBDD, finding sa- 
tisfying solutions would be trivial. We show that this approach will not be feasible in 
general, because the OBDD representation of Fjrans ‘'^*1 intractable. That is, for 
some sets of relational variables, the OBDD representation of the transitivity constraint 
formula Ffrans of exponential size regardless of the variable ordering. The NP- 

completeness result of Goel, et al. shows that the OBDD representation of Ftrans 
be of exponential size using the ordering previously selected for representing as 
an OBDD. This leaves open the possibility that there could be some other variable or- 
dering that would yield efficient OBDD representations of both and F^j-^jjg. Our 
result shows that transitivity constraints can be intrinsically intractable to represent with 
OBDDs, independent of the structure of Fg^^. 

We present experimental results on the complexity of constructing OBDDs for the 
transitivity constraints that arise in actual microprocessor verification. Our results show 
that the OBDDs can indeed be quite large. We consider two techniques to avoid construc- 
ting the OBDD representation of all transitivity constraints. The first of these, proposed 
by Goel et al. [GSZAS98], generates implicants (cubes) of Fg^j- and rejects those that 
violate the transitivity constraints. Although this method suffices for small benchmarks, 
we find that the number of implicants generated for our larger benchmarks grows unac- 
ceptably large. The second method determines which relational variables actually occur 
in the OBDD representation of Fg^{. We can then apply one of our three encoding tech- 
niques to generate a Boolean formula for the transitivity constraints over this reduced set 
of relational variables. The OBDD representation of this formula is generally tractable, 
even for the larger benchmarks. 

Due to space limitations, this paper omits many technical details. More information, 
including formal proofs, is included in [BVOO]. 

2 Benchmarks 

Our benchmarks [VB99] are based on applying our verifier to a set of high-level micro- 
processor designs. Each is based on the DLX RISC processor described by Hennessy 
and Patterson [HP96]; 

lx DLX-C : is a single-issue, five-stage pipeline capable of fetching up to one new 
instruction every clock cycle. It implements six instruction types and contains an 
interlock to stall the instruction following a load by one cycle if it requires the loaded 
result. This example is comparable to the DLX example first verified by Burch and 
Dill [BD94]. 

2xDLX-CA: has a complete first pipeline, capable of executing the six instruction 
types, and a second pipeline capable of executing arithmetic instructions. This 
example is comparable to one verified by Burch [Bur96]. 

2 X DLX-CC: has two complete pipelines, i.e., each can execute any of the 6 instruction 
types. 

In all of these examples, the domain variables Vi, with 1 < z < A^, in Fygj- encode 
register identifiers. As described in [BGV99a,BGV99b], we can encode the symbolic 
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Circuit 


Domain Propositional Equations 
Variables Variables 


IxDLX-C 




13 


42 


27 


IxDLX-Ct 




13 


42 


37 


2xDLX-CA 




25 


58 


118 


2xDLX-CAt 


25 


58 


137 


|2xDLX-CC 




25 


70 


124 


2xDLX-CCt 


25 


70 


143 


Buggy 


min. 


22 


56 


89 


2xDLX-CC 


avg. 


25 


69 


124 




max. 


25 


77 


132 



Table 1. Microprocessor Verification Benchmarks. Benchmarks with suffix “t” were modified 
to require enforcing transitivity. 



terms representing program data and addresses as distinct values, avoiding the need to 
have equations among these variables. Equations arise in modeling the read and write 
operations of the register file, the bypass logic implementing data forwarding, the load 
interlocks, and the pipeline issue logic. 

Our original processor benchmarks do not require enforcing transitivity in order to 
verify them. In particular, the formula is unsatisfiable in all cases. This implies 
that the constrained satisfiability problems are unsatisfiable as well. We are nonetheless 
motivated to study the problem of constrained satisfiability for two reasons. First, other 
processor designs might rely on transitivity, e.g., due to more sophisticated issue logic. 
Second, to aid designers in debugging their pipelines, it is essential that we generate 
counterexamples that satisfy all transitivity constraints. Otherwise the designer will be 
unable to determine whether the counterexample represents a true bug or a weakness of 
our verifier. 

To create more challenging benchmarks, we generated variants of the circuits that 
require enforcing transitivity in the verification. For example, the normal forwarding 
logic in the Execute stage of IxDLX-C compares the two source registers ESrcl and 
ESrc2 of the instruction in the Execute stage to the destination register MDest of the 
instruction in the memory stage. In the modified circuit, we changed the bypass condition 
ESrcl = MDest to be ESrcl = MDest V (ESrcl = ESrc2 A ESrc2 = MDest). Given 
transitivity, these two expressions are equivalent. For each pipeline, we introduced four 
such modifications to the forwarding logic, with different combinations of source and 
destination registers. These modified circuits are named 1 xDLX-Ct, 2xDLX-CAt, and 
2xDLX-CCt. 

To study the problem of counterexample generation for buggy circuits, we generated 
105 variants of 2xDLX-CC, each containing a small modification to the control logic. 
Of these, 5 were found to be functionally correct, e.g., because the modification cau- 
sed the processor to stall unnecessarily, yielding a total of 100 benchmark circuits for 
counterexample generation. 

Table 1 gives some statistics for the benchmarks. The number of domain variables N 
ranges between 13 and 25, while the number of equations ranges between 27 and 143. 
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The verification condition formulas i^yer contain between 42 and 77 propositional 
variables expressing the operation of the control logic. These variables plus the relational 
variables comprise the set of variables V in the propositional formula Fg^t- The circuits 
with modifications that require enforcing transitivity yield formulas containing up to 19 
additional equations. The final three lines summarize the complexity of the 100 buggy 
variants of 2xDLX-CC. We apply a number of simplifications during the generation of 
formula and hence small changes in the circuit can yield significant variations in 
the formula complexity. 

3 Graph Formulation 

Our definition of Trans (S) (Equation 1) places no restrictions on the length or form of 
the transitivity constraints, and hence there can be an infinite number. We show that we 
can construct a graph representation of the relational variables and identify a reduced 
set of transitivity constraints that, when satisfied, guarantees that all possible transitivity 
constraints are satisfied. By infroducing more relational variables, we can alter this graph 
structure, further reducing the number of transitivity constraints that must be considered. 

For variable set S, define the undirected graph G{S) as containing a vertex i for 
1 < i < and an edge (i,j) for each variable Cij G S. For an assignment x of 
Boolean values to the relational variables, we will classify edge (f, j) as a 1-edge when 
x(eij) = 1, and as a 0-edge when = 0- 

A path is a sequence of vertices [zi,t2, . . . ,ik] having edges between successive 
elements, i.e., 1 < ip < N for all p such that I < p < k, and {ip, ip+i) is in G{6) for 
all p such that 1 < p < fc. We consider each edge {ip, ip+i) for 1 < p < fc to also be 
part of the path. A cycle is a path of the form [zi, i 2 , . . . ,ik, ii]- 

Proposition 1. An assignment to the variables in £ violates transitivity if and only if 
some cycle in G{£) contains exactly one 0-edge. 

A path [zi, Z2, . . . , Zfc] is said to be acyclic when ip iq for all 1 < p < g < fc. A 
cycle [zi, Z2, . . . ,ik, zi] is said to be simple when its prefix [zi, Z2, . . . , Zfc] is acyclic. 

Proposition 2. An assignment to the variables in £ violates transitivity if and only if 
some simple cycle in G{£) contains exactly one 0-edge. 

Define a chord of a simple cycle fo be an edge fhaf connecfs two vertices that are 
not adjacent in the cycle. More precisely, for a simple cycle [zi, Z2, . . . , Zfc, Zi], a chord 
is an edge {ip, iq) in G{£) such that 1 < p < q < k, that p 1 < q, and either p 1 
or q k. A cycle is said to be chord-free if it is simple and has no chords. 

Proposition 3. An assignment to the variables in £ violates transitivity if and only if 
some chord-free cycle in G{£) contains exactly one 0-edge. 

For a set of relational variables £, we define F(-j-^jjs(£i) fo be the conjunction of 
all transitivity constraints generated by enumerating the set of all chord-free cycles in 
the graph G{£). Each length k cycle [zi, Z2 , . . . ,ik, zi] yields k constraints. It is easily 
proved that an assignment to the relational variables will satisfy all of the transitivity 
constraints if and only if it satisfies F(-j-^jjg(£i). 
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Fig. 1. Class of Graphs with Many Chord-Free Cycles. For a graph with n diamond-shaped 
faces, there are 2" -|- n chord-free cycles. 



3.1 Enumerating Chord-Free Cycles 

To enumerate the chord-free cycles of a graph, we exploit the following properties. An 
acyclic path [ii, Z 2 , . . . , ik] is said to have a chord when there is an edge {ip, iq) in G{S) 
such that 1 < p < q < k, that p + 1 < q, and either p ^ 1 or q ^ k. We classify a 
chord-free path as terminal when {ik, ii) is in G{S), and as extensible otherwise. 

Proposition 4. A path [ii, Z 2 , . . . , ik] is chord-free and terminal if and only if the cycle 
[ii,i 2 , . ■ . ,ik, ii] is chord-free. 

A proper prefix of path [zi, Z 2 , . . . , ik] is a path [zi, Z 2 , . . . , z^] such that 1 < J < fc. 
Proposition 5. Every proper prefix of a chord-free path is chord-free and extensible. 

Given these properties, we can enumerate the set of all chord-free paths by breadth 
first expansion. As we enumerate these paths, we also generate C the set of all chord-free 
cycles. Define Pk to be the set of all extensible, chord-free paths having k vertices, for 
1 < fc < TV. As an initial case, we have T*! = {[z]|l < z < zz}, and we have C = 0. At 
each step we consider all possible extensions to the paths in Pk to generate the set 
and to add some cycles of length fc -f 1 to C. 

As Figure 1 indicates, there can be an exponential number of chord-free cycles in 
a graph. In particular, this figure illustrafes a family of graphs with 3rz -f 1 vertices. 
Consider the cycles passing through the rz diamond-shaped faces as well as the edge 
along the bottom. For each diamond-shaped face Fi, a cycle can pass through either the 
upper vertex or the lower vertex. Thus there are 2” such cycles. 

The columns labeled “Direct” in Table 2 show results for enumerating the chord-free 
cycles for our benchmarks. For each correct microprocessor, we have two graphs: one 
for which transitivity constraints played no role in the verification, and one (indicated 
with a “t” at the end of the name) modified to require enforcing transitivity constraints. 
We summarize the results for the transitivity constraints in our 100 buggy variants of 
2xDLX-CC, in terms of the minimum, the average, and the maximum of each measu- 
rement. We also show results for five synfhefic benchmarks consisting of zz x rz planar 
meshes M„, with rz ranging from 4 to 8, where the mesh for rz = 6 is illustrated in Figure 
2. For all of the circuit benchmarks, the number of cycles, although large, appears to be 
manageable. Moreover, the cycles have at most 4 edges. The synthetic benchmarks, on 
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1,290 
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1,472,184 


1,176 18,424 


55,272 


206 


408 


1,224 


Ms 




112 1,743,247 48,559,844 


2,016 41,664 124,992 


294 


662 


1,986 



Table 2. Cycles in Original and Augmented Benchmark Graphs. Results are given for the three 
different methods of encoding transitivity constraints. 



the other hand, demonstrate the exponential growth predicted as worst case behavior. 
The number of cycles grows quickly as the meshes grow larger. Furthermore, the cycles 
can be much longer, causing the number of clauses to grow even more rapidly. 

3.2 Adding More Relational Variables 

Enumerating the transitivity constraints based on only the variables in S runs the risk of 
generating a Boolean formula of exponential size. We can guarantee polynomial growth 
by considering a larger set of relational variables. In general, let S' be some set of 
relational variables such that S C S' , and let Ftrans(^0 be the transitivity constraint 
formula generated by enumerating the chord-free cycles in the graph G{S'). 

Proposition 6. IfS is the set of relational variables in and £ C £' , then: 

Psat ^ Ptransi^) ^sat ^ ^transi^ )• 

Our goal then is to add as few relational variables as possible in order to reduce the size 
of the transitivity formula. We will continue to use our path enumeration algorithm to 
generate the transitivity formula. 

3.3 Dense Enumeration 

For the dense enumeration method, let £n denote the set of variables Cij for all values 
of i and j such that 1 < i < j < N. Graph G{Sm) is a complete, undirected graph. In 
this graph, any cycle of length greater than three must have a chord. Hence our algorithm 
will enumerate transitivity constraints of the form Cfij-j A C[j^k] for distinct 
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values of i,j, and k. The graph has N{N—1) edges and N{N— 1){N —2)/6 chord-free 
cycles, yielding a total of N{N — 1){N — 2)/2 = 0{N^) transitivity constraints. 

The columns labeled “Dense” in Table 2 show the complexity of this method for 
the benchmark circuits. For the smaller graphs IxDLX-C, IxDLX-Ct, M 4 and M 5 , 
this method yields more clauses than direct enumeration of the cycles in the original 
graph. For the larger graphs, however, it yields fewer clauses. The advantage of the dense 
method is most evident for the mesh graphs, where the cubic complexity is far superior 
to exponential. 

3.4 Sparse Enumeration 

We can improve on both of these methods by exploiting the sparse structure of G{S). 
Like the dense method, we want to introduce additional relational variables to give a set 
of variables such that the resulting graph G{£^) becomes chordal [RoseTO]. That 
is, the graph has the property that every cycle of length greater than three has a chord. 

Chordal graphs have been studied extensively in the context of sparse Gaussian 
elimination. In fact, the problem of finding a minimum set of additional variables to add 
to our set is identical to the problem of finding an elimination ordering for Gaussian 
elimination that minimizes the amount of fill-in. Although this problem is NP-complete 
[YanSl], there are good heuristic solutions. In particular, our implementation proceeds 
as a series of elimination steps. On each step, we remove some vertex i from the graph. 
For every pair of distinct, uneliminated vertices j and k such that the graph contains 
edges {i,j) and {i, k), we add an edge (j, k) if it does not already exist. The original 
graph plus all of the added edges then forms a chordal graph. To choose which vertex to 
eliminate on a given step, our implementation uses the simple heuristic of choosing the 
vertex with minimum degree. If more than one vertex has minimum degree, we choose 
one that minimizes the number of new edges added. 

The columns in Table 2 labeled “Sparse” show the effect of making the benchmark 
graphs chordal by this method. Observe that this method gives superior results to either 
of the other two methods. In our implementation we have therefore used the sparse 
method to generate all of the transitivity constraint formulas. 

4 SAT-Based Decision Procedures 

We can solve the constrained satisfiability problem using a conventional SAT checker 
by generating a set of clauses Cjj-ans representing Ftrans('^^) ^ of clauses Ggat 
representing the formula f^sat- We then run the checker on the combined clause set 
Qrans ^ ^sat satisfying solutions to Ftrans(^^) ^ f^sat- 

In experimenting with a number of Boolean satisfiability checkers, we have found that 
FGRASP [MS99] gives the most consistent results. The most recent version can be directed 
to periodically restart the search using a randomly-generated variable assignment [M99] . 
This is the first SAT checker we have tested that can complete all of our benchmarks. 
All of our experiments were conducted on a 336 MHz Sun UltraSPARC II with 1. 2GB 
of primary memory. 

As indicated by Table 3, we ran fgrasp on clause sets and G^j-^nsU i.e., both 

without and with transitivity constraints. For benchmarks 1 xDLX-C, 2xDLX-CA, and 
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125 


Y 
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Y 
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Y 
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Table 3. Performance of fgrasp on Benchmark Circuits. Results are given both without and 
with transitivity constraints. 



2 X DLX-CC, the formula is unsatisfiable. As can be seen, including transitivity con- 
straints increases the run time significantly. For benchmarks 1 xDLX-Ct, 2xDLX-CAt, 
and 2xDLX-CCt, the formula is satisfiable, but only because transitivity is not 
enforced. When we add the clauses for F^j-^jj^, the formula becomes unsatisfiable. For 
the buggy circuits, the run times for Fgat range from under 1 second to over 36 minutes. 
The run times for Qj-ans *^sat range from less than one second to over 12 hours. In 
some cases, adding transitivity constraints actually decreased the CPU time (by as much 
as a factor of 5), but in most cases the CPU time increased (by as much as a factor of 
69). On average (using the geometric mean) adding transitivity constraints increased 
the CPU time by a factor of 2.3. We therefore conclude that satisfiability checking with 
transitivity constraints is more difficult than conventional satisfiability checking, but the 
added complexity is not overwhelming. 

5 OBDD-Based Decision Procedures 

A simple-minded approach to solving satisfiability with transitivity constraints using 
OBDDs would be to generate separate OBDD representations of F^j-^ns ^^d Fg^f. We 
could then use the Apply operation to generate an OBDD for F^j-^jig A F^^t, and then 
either find a satisfying assignment or determine that the function is unsatisfiable. We 
show that for some sets of relational variables £, the OBDD representation of F{j.^jjg(£l) 
can be too large to represent and manipulate. In our experiments, we use the CUDD 
OBDD package with variable reordering by sifting. 

5.1 Lower Bound on the OBDD Representation of ^trans(^) 

We prove that for some sets £, the OBDD representation of may be of 

exponential size for all possible variable orderings. As mentioned earlier, the NP- 
completeness result proved by Goel et al. [GSZAS98] has implications for the com- 
plexity of representing F{j.^jjg(£l) as an OBDD. They showed that given an OBDD 
representing formula F^^f, the task of finding a satisfying assignment of F^^t that also 
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Fig. 2. Mesh Graph Me 



satisfies the transitivity constraints in Trans (S) is NP-complete in the size of Ggat. By 
this, assuming P ^ NP, we can infer that the OBDD representation of may 

be of exponential size when using the same variable ordering as is used in G^at- Our 
result extends this lower bound to arbitrary variable orderings and is independent of the 
P vs. NP problem. 

Let Mn denote a planar mesh consisting of a square array of n x n vertices. For 
example, Figure 2 shows the graph for n = 6. Define Enxn to be a set of relational 
variables corresponding to the edges in M„. Ftransi^nxn) is then an encoding of the 
transitivity constraints for these variables. 

Theorem 1. Any OBDD representation of Fffdjig{£nxn) must have vertices. 

A complete proof of this theorem is given in [B VOO] . We give only a brief sketch here. 
Being a planar graph, the edges partition the plane ivAo faces. The proof first involves a 
combinatorial argument showing that for any partitioning of the edges into sets A and 
B, we can identify a set of at least (n — 3) /4 edge-independent, “split faces,” where a 
split face has some of its edge variables in set A and others in set B. The proof of this 
property is similar to a proof by Leighton [Lei92, Theorem 1.21] that Mn has a bisection 
bandwidth of at least n, i.e., one must remove at least n vertices to split the graph into 
two parts of equal size. 

Given this property, for any ordering of the OBDD variables, we can construct a 
family of assignments to the variables in the first half of the ordering that must 

lead to distinct vertices in the OBDD. That is, the OBDD must encode information about 
each split face for the variables in the first half of the ordering so that it can correctly 
deduce the function value given the variables in the last half of the ordering. 

Corollary 1. For any set of relational variables E such that Enxn Q E, any OBDD 
representation of F(i-Qng{E) must contain 17(2"/®) vertices. 

The extra edges in E introduce complications, because they create cycles containing 
edges from different faces. As a result, the lower bound is weaker, because our proof 
requires that we find a set of vertex-independent, split faces. 

Our lower bounds are fairly weak, but this is more a reflection of the difficulty of 
proving lower bounds. We have found in practice that the OBDD representations of 
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the transitivity constraint functions arising from benchmarks tend to be large relative 
to those encountered during the evaluation of ^sat- For example, although the OBDD 
representation of Ftrans benchmark 1 x DLX-Ct is just 2,692 nodes (a function 

over 42 variables), we have been unable to construct the OBDD representations of this 
function for either 2xDLX-CAt (178 variables) or 2xDLX-CCt (193 variables) despite 
running for over 24 hours. 

5.2 Enumerating and Eliminating Violations 

Goel et al. [GSZAS98] proposed a method that generates implicants (cubes) of the 
function from its OBDD representation. Each implicant is examined and discarded if 

it violates a transitivity constraint. In our experiments, we have found this approach works 
well for the normal, correctly-designed pipelines (i.e., circuits 1 xDLX-C, 2xDLX-CA, 
and 2xDLX-CC) since the formula is unsatisfiable and hence has no implicants. 
For all 100 of our buggy circuits, the first implicant generated contained no transitivity 
violation, and hence we did not require additional effort to find a counterexample. 

For circuits that do require enforcing transitivity constraints, we have found this 
approach impractical. For example, in verifying 1 x DFX-Ct by this means, we generated 
253,216 implicants, requiring a total of 35 seconds of CPU time (vs. 0.1 seconds for 
1 xDFX-C). For benchmarks 2xDFX-CAt and 2xDFX-CCt, our program ran for over 
24 hours without having generated all of the implicants. By contrast, circuits 2 x DFX-CA 
and 2xDFX-CC can be verified in 1 1 and 29 seconds, respectively. Our implementation 
could be improved by making sure that we generate only primes that are irredundant and 
prime. In general, however, we believe that a verifier that generates individual implicants 
will not be very robust. The complex control logic for a pipeline can lead to formulas 
Fgat containing very large numbers of implicants, even when transitivity plays only a 
minor role in the correctness of the design. 

5.3 Enforcing a Rednced Set of Transitivity Constraints 



Circuit 


Verts. 


Direct 

Edges Cycles Clauses 


Dense 

Edges Cycles Clauses 


Sparse 

Edges Cycles Clauses 


lx DLX-Ct 




9 


18 


14 


45 


36 


84 


252 


20 


19 


57 


2xDLX-CAt 


17 


44 


101 


395 


136 


680 


2,040 


49 


57 


171 


2xDLX-CCt 


17 


46 


108 


417 


136 


680 


2,040 


52 


66 


198 


Reduced 


min. 


3 


2 


0 


0 


3 


1 


3 


2 


0 


0 


Buggy 


avg. 


12 


17 


19 


75 


73 


303 


910 


21 


14 


42 


2xDLX-CC 


max. 


19 


52 


378 


1,512 


171 


969 


2,907 


68 


140 


420 



Table 4. Graphs for Reduced Transitivity Constraints. Results are given for the three different 
methods of encoding transitivity constraints based on the variables in the true support of j'^sat- 



One advantage of OBDDs over other representations of Boolean functions is that 
we can readily determine the true support of the function, i.e., the set of variables on 
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which the function depends. This leads to a strategy of computing an OBDD represen- 
tation of and intersecting its support with E to give a set £ of relational variables 
that could potentially lead to transitivity violations. We then augment these variables to 
make the graph chordal, yielding a set of variables £'^ and generate an OBDD represen- 
tation of Ttrans('^^)' compute A Ttrans('^^) i'- satisfiable, generate a 

counterexample. 

Table 4 shows the complexity of the graphs generated by this method for our bench- 
mark circuits. Comparing these with the full graphs shown in Table 2, we see that we 
typically reduce the number of relational vertices (i.e., edges) by a factor of 3 for the 
benchmarks modified to require transitivity and by an even greater factor for the buggy 
circuit benchmarks. The resulting graphs are also very sparse. For example, we can 
see that both the direct and sparse methods of encoding transitivity constraints greatly 
outperform the dense method. 



Circuit 


OBDD Nodes 

^sat ^trans(^^) -^sat ^ ^trans(^^) 


CPU 

Secs. 


IxDLX-C 




1 


1 


1 


0.2 


IxDLX-Ct 




530 


344 


1 


2 


2xDLX-CA 




1 


1 


1 


11 


2xDLX-CAt 


22,491 


10,656 


1 


109 


|2xDLX-CC 




1 


1 


1 


29 


2xDLX-CCt 


17,079 


7,168 


1 


441 


Reduced 


min. 


20 


1 


20 


7 


Buggy 


avg. 


3,173 


1,483 


25,057 


107 


2xDLX-CC 


max. 


15,784 


93,937 


438,870 


2,466 



Table 5. OBDD-based Verification. Transitivity constraints were generated for a reduced set of 
variables £. 



Table 5 shows the complexity of applying the OBDD-based method to all of our 
benchmarks. The original circuits IxDLX-C, 2xDLX-CA, and 2xDLX-CC yielded 
formulas ^sat that were unsatisfiable, and hence no transitivity constraints were re- 
quired. The 3 modified circuits IxDLX-Ct, 2xDLX-CAt, and 2xDLX-CCt are more 
interesting. The reduction in the number of relational variables makes it feasible to ge- 
nerate an OBDD representation of the transitivity constraints. Compared to benchmarks 
IxDLX-C, 2xDLX-CA, and 2xDLX-CC, we see there is a significant, although to- 
lerable, increase in the computational requirement to verify the modified circuifs. This 
can be atfributed fo bofh fhe more complex control logic and to the need to apply the 
transitivity constraints. 

For the 100 buggy variants of 2xDLX-CC, fg^t depends on up to 52 relational 
variables, with an average of 17. This yielded OBDDs for Ftrans(^^) ranging up to 
93,937 nodes, with an average of 1,483. The OBDDs for Ftrans('^^) ^ ^sat ranged 
up to 438,870 nodes (average 25,057), showing that adding transitivity constraints does 
significantly increase the complexity of the OBDD representation. However, this is just 
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one OBDD at the end of a sequence of OBDD operations. In the worst case, imposing 
transitivity constraints increased the total CPU time by a factor of 2, but on average it 
only increased by 2%. The memory required to generate ranged from 9.8 to 50.9 
MB (average 15.5), but even in the worst case the total memory requirement increased 
by only 2%. 

6 Conclusion 

By formulating a graphical interpretation of the relational variables, we have shown that 
we can generate a set of clauses expressing the transitivity constraints that exploits the 
sparse structure of the relation. Adding relational variables to make the graph chordal 
eliminates the theoretical possibility of there being an exponential number of clauses 
and also works well in practice. A conventional SAT checker can then solve constrained 
satisfiability problems, although the run times increase significantly compared to uncon- 
strained satisfiability. Our best results were obtained using OBDDs. By considering only 
the relational variables in the true support of we can enforce transitivity constraints 
with only a small increase in CPU time. 
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Abstract. The monadic logics M2L-Str and WSIS have been succes- 
sfully used for verification, although they are nonelementary decidable. 
Motivated by ideas from bounded model checking, we investigate proce- 
dures for bounded model construction for these logics. The problem is, 
given a formula (j> and a bound k, does there exist a word model for (j> 
of length k. We give a bounded model construction algorithm for M2L- 
Str that runs in a time exponential in k. For WSIS, we prove a negative 
result: bounded model construction is as hard as validity checking, i.e., 
it is nonelement ary. From this, negative results for other monadic logics, 
such as SIS, follow. We present too preliminary tests using a SAT-based 
implementation of bounded model construction; for certain problem clas- 
ses it can find counter-examples substantially faster than automata-based 
decision procedures. 



1 Introduction 

The monadic logics M2 L-Str, WSIS, and SIS are among the most expressive 
decidable logics known. The logic M2 L-Str [11] is a logic on finite words and also 
appears in the literature (with slight variations) under the names MSO[S] [20] 
and SOM[-|-l] [19]. In the early 1960’s, Biichi and Elgot gave decision procedures 
for these logics by exploiting the fact that models can be encoded as words 
and that the language of models satisfying a formula can be represented by 
an automaton [5,6,9]. These decision procedures provide nonelementary upper- 
bounds for these logics, which are also the lower-bounds [16]. 

Despite their atrocious complexity, the decision procedures for M2 L-Str and 
WSIS have been implemented in numerous tools, e.g., Mona [14], Mosel [18], 
MOSEL [12], and the STEP system [15], and have been successfully applied to 
problems in diverse domains including hardware [2] and protocol [11] verification. 
Not surprisingly though, many large systems cannot be verified due to state 
explosion. This is analogous to state explosion in model checking where the 
state-space is exponential in the number of state variables, except for monadic 
logics the states in the constructed automaton can be nonelementary in the size 
of the input formula! For LTL model checking, a way of finessing this problem 
has recently been proposed: hounded model checking [4]. The idea is that one 
can finitely represent counter-examples (using the idea of a loop, see §4.3), and, 
by bounding the size of these representations, satisfiability checkers can be used 
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to search for them. This often succeeds in cases where symbolic model checking 
fails. 

Motivated by the bounded model checking approach and the goal of quick 
generation of counter-examples for falsifiable formulae, we investigate an ana- 
logous problem for monadic logics. Namely, given a formula (j) and a natural 
number k, determine if (j) has a word model of length k. Since we are concerned 
with constructing models for formulae, as opposed to checking their satisfiability 
with respect to a given model, we call our problem bounded model construction 
or BMC for short. 

We show that for M2L-Str, given a formula (j) and a natural number k, 
we can generate a formula in quantified Boolean logic that is satisfiable if and 
only if 4> has a word model of length k. The formula generated is polynomial in 
the size of (j) and k and can be tested for satisfiability in polynomial space. For 
generating length k counter- models, this yields a nonelementary improvement 
over the automata-based decision procedure for M2L-Str. Moreover, we show 
that the use of SAT-based techniques can have acceptable running times in 
practice. 

We also investigate bounded model construction for other monadic logics and 
establish negative results. For WSIS we show that BMC is as hard as checking 
validity, which is nonelementary. This result is somewhat surprising since WSIS 
has the same expressiveness and complexity as M2L-Str and their decision pro- 
cedures differ only slightly. Indeed, there has been a recent investigation of the 
differences of these logics by Klarlund who concluded that WSIS is preferable to 
M2L-Str due to its simpler semantics and its wider applicability to arithmetic 
problems [13]. Our results suggests that the issue is not so clear cut and depends 
on whether error detection through counter-example generation versus full veri- 
fication is desired, that is, whether one is interested in finding a single model for 
a formula or computing a description of all models. We also formulate BMC for 
SIS and several first-order monadic logics and establish similar negative results. 

We proceed as follows. In §2 we briefly review quantified Boolean logic and 
finite automata on words. In §3 we describe the syntax and semantics of M2L- 
Str and WSIS and their relationship to finite automata. In §4 we present the 
bounded model construction approach and complexity results. In §5 we present 
experimental results and in §6 we draw conclusion. 



2 Background 

Boolean Logic Boolean formulae are built from the constants true and false, 
variables cc G V, and are closed under the standard connectives. The formulae 
are interpreted in B = {0, 1}. A (Boolean) substitution cr : V — >■ B is a mapping 
from variables to truth values that is extended homomorphically to formulae. 
We say a satisfies f if <j{4>) = 1. 

Quantified Boolean logic (QBL) extends Boolean logic (BL) by allowing 
quantification over Boolean variables, i.e., Vx.f and 3x.(f>. A substitution cr 
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satisfies Vx. ^ if a satisfies <j>[true/x\ A </>[false/a;] and dually for In the 

remainder of the paper, we write a ^qbl 4> to denote that cr satisfies </>. 

QBL is not more expressive than BL, but it is more succinct. The satisfiabi- 
lity problem for Boolean logic is NP-complete [8], whereas it is PSPACE-complete 
for QBL [17]. 

Automata on Words Let E denote a finite alphabet. E* (respectively E‘^) 
denotes the set of finite (respectively infinite) words over E. A finite automaton 
A over A is a tuple (S,so,A,F) where S' is a nonempty finite set of states, 
So G S is the initial state, ACSxExSisa transition relation, and F C S is a 
set of final states. A run of A on a finite word ic = ai 02 . . . a„ (respectively, an 
infinite word w = ai 02 . . . ) is a finite sequences of states sqSi . . . Sn (respectively, 
an infinite sequence of states sqSi . . .) with (si,Oi,Si+i) G A. A finite word is 
accepted by an automaton if it has a run whose last state is final. To accept 
infinite words, finite automata are equipped with a Biichi acceptance condition, 
which says that an infinite word is accepted if it has a run in which some final 
state occurs infinitely often. We will often use the alphabet B", with n G N. 
Note that B° stands for the singleton set {()}, be., the set whose only member 
is the degenerate tuple “()”. 

3 Monadic Second-Order Logics on Finite Words 

In this section we provide background material on M2 L-Str and WSIS. These 
logics have the same syntax but slightly different semantics. We also explain 
their relationship to regular languages. 

Let Vi = {xi I i G N} be a set of first-order variables and V 2 = {Xi | f G N} 
be a set of second-order variables. We will use n, p, q, ... as meta-variables 
ranging over Vi and we use X, Y, . . . as meta- variables ranging over V 2 . 



3.1 Language 

Monadic second order (MSO) formulae are formulae in a language of second- 
order arithmetic specified by the grammar: 

t::=0\p, pGVi 

(j> ::= s{t, t) \ X{t) \-'4>\ 4>y 4>\^P-4>\ (j), p G Vi and A G V 2 

Hence terms are built from the constant 0 and first-order variables. Formulae 
are built from predicates s{t,t') and X{t) and are closed under disjunction, 
negation, and quantification over first-order and second-order variables. Other 

connectives and quantifiers can be defined using standard classical equivalences, 
d&f 

e.g., VA. (j) = -i3A. -i0. In other presentations, s is usually a function. We have 
specified it as a relation for reasons that will become apparent when we give the 
semantics. In the remainder of this section, formula means MSO-formula. 
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3.2 Semantics 

For X a set, by T{X) we denote the set of finite subsets of X. A (MSO) substitu- 
tion cr is a pair of mappings cr = (cti, (J 2 ), with cti : Vi — >■ N and (J 2 : V 2 — >■ iF(N) 
and for x G Vi, a{x) = cri(a;) and for X G V 2 , cr{X) = a 2 {X). With this in hand, 
we can now define satisfiability for M2 L-Str and WSIS. 

The Logic M2L-STR Formulae in M2 L-Str are interpreted relative to a 
natural number fc G N. We will write [A:] for the set {0, . . . ,k — 1} and we call 
the elements of [k] positions. First-order variables are interpreted as positions. 
The constant 0 denotes the natural number 0 and the symbol s is interpreted as 
the relation {{i,j) | j = z -I- 1 and i,j G [A:]}. Note that fc — 1 has no successor. 
Second-order variables denote subsets of [A:] and the formula X{t) is true when 
the position denoted by t is in the set denoted by X. 

More formally, the semantics of a formula (j) is defined inductively relative to 
a substitution cr and a A: G N. In the following, we write for the pair (ct, k). 

Definition 1 Satisfiability for M2L-Str 

cr^ \=M 2 L s{t,t'), if cr(A') = 1-1- cr(t) and cr(t') G [A;] 
cr'^ \=M2L X{t), if a{t) G a{X) 

(t’" ^m2l ~^<f, if o-'" V=m2l 

^ \^M2L 4’2i if O’ \^M2L or <T \=M2L 4^2 

o-'" \=M 2 L 3p. (j), if (cr[z/p])'= '^m 2 l 4>, for some i G [A;] 

cr'" \=M 2 L 3X. 4>, if {a[M/X]fi \=U 2 L for some M C [k] 

If |=M 2 L we say that cr^ satisfies, or is a model of, fi. We call a formula 
(j) valid, and we write ^m 2 l 4>i if for every natural number k and substitution a, 
cr^ satisfies 

The Logic WSIS Whereas M2 L-Str can be seen as a logic on bounded sets of 
positions or, as we shall see, finite words, WSIS is best viewed as a logic based 
on arithmetic. First-order variables range over N and are not a priori bounded 
by any natural number. Second-order variables range over finite subsets of the 
natural numbers, 1F(N), and are not restricted to subsets of some [A:]. Finally, 
the symbol s is interpreted as the successor relation over N. Formally, we define 
satisfiability in WSIS, cr ^wsis rts follows: 

Definition 2 Satisfiability for WSIS 

O' \=wsis s{t,t'), if cr(A') = 1 + a{t) 

^ \^WS1S X{t), if aft) G a{X) 

^ \^WS1S ~'4^i if a ^ 

WSIS ^ 

O’ |=H'S 1 S 01 V 02) if O' |=iysis 01 or a |=iysjs 02 

O’ \=wsis 0, if a[i/p] 0) for some i G N 

cr |=„/sis 3X. 0, if a[M/X] hvrsis 0) for some M G .T^(N) 



A formula is valid in WSIS if it is satisfied by every substitution a. 
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Word Models Models in both M2 L-Str and WSIS can be encoded as finite 
words. Let 4>{X) be a formula, where X is the tuple of second-order variables 
Xi, ... , Xn occurring free in (j).^ We encode a M2 L-Str model for (j) by the 
word Wfjk G (B”)'', such that the length of w„k is k and for every position i G [k], 
w^k{i) = (&!,... , bn) and for 1 < j < n, = 1 iff i G a{Xj). We call w„k a 
word model for (f) and define £m 2 l(^^') as the set of all M2 L-Str word models for 
(j). We shall also write w ^m 2 l <t> for ^m 2 l where w encodes a^. 

Similarly, a WSIS model cr for (j) can be encoded as a finite word such 
that Wa{i) = {bi, ... ,bn) where = 1 iff z G cr{Xj). We also call a word 
model for (j) in WSIS. We define £wsis(</>) as the set of WSIS word models for <j>. 
Note that the encoding of M2 L-Str models as words is a bijection, whereas this 
is not the case for WSIS. In particular, if cr is a WSIS model and w„ encodes 
it, then any finite word of the form Wo-aa ■ ■ ■ a, where a is (0, . . . , 0) G B", also 
encodes a. We shall also write w ^wsis for cr |=wsis where w encodes a. 



Example Consider the formula </> = X(0) A Vp. X(p) o {3q. s{p, q) AY (q)) and 
the substitution a with a{X) = {0,2} and a{Y) = {0, 1,3}. cr^ is a model for 
(f) in M2 L-Str and cr is a model for (f> in WSIS. The words w and w' below 
encode cr"* and a, respectively. 



zz;0123 w' 012345 
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1 
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0 


1 


0 


0 
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Y 


T 


T 


0 


T 


Y 


T 


T 




T 


(} 


0 



As a second example, the formula 3X. Vp. X{p) is valid in M2 L-Str, whereas 
it is unsatisfiable in WSIS. 



Connection to Regular Languages We have seen that monadic formulae 
define sets of word models. Biichi and Elgot proved in [5,9] that the languages 
formalized by formulae in WSIS and M2 L-Str are regular and, conversely, 
that every regular language is both WSIS and M2L-STR-definable. To show 
regularity, they proved constructively that, given a formula 4>, there exists an 
automaton that accepts all WSIS (respectively, M2 L-Str) word models 
for (j). This construction yields a decision procedure: a closed formula is valid 
in WSIS (respectively, in M2 L-Str) iff its corresponding automaton accepts 
the language ()*. This decision procedure (and indeed any decision procedure 
for these logics) is nonelementary [16,21]: the minimal automaton representing a 
formula of size n may require space whose lower bound is a stack of exponentials 
of height n. 

As noted previously, in WSIS any word model over A = B" can be suffixed 
by arbitrarily many (0, . . . , 0) G B” and the result is again a word model. Hence 
we explain in which sense regular languages are definable in both monadic logics, 
as this is not completely straightforward. Let A = {m, . . . , a„} and let d : B" — >• 

^ First-order variables can be encoded using second-order variables as we will show in 
S4.1. 
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(c) 3X. alternatewsis{X) 



Fig. 1. Automata for Example 



S be the substitution defined by 0(5i, . . . , 6„) = Oi, where bj = 1 iS j = i, and 
let ^ be the congruence relation over (B")* defined byM~uiffM = x.(0,... ,0)* 
and V = x.(0, . . . ,0)-^ with x € (B”)* and i,j G N. We straightforwardly extend 
9 to words over (B")*, sets of words, ^-classes, and sets of ^-classes. Now, 
for a regular language L C if*, we can construct formulae . . . ,X„) and 

ip{Xi,... ,Xn) such that for M2 L-Str we have L = 9{Cm2l{<I>{X))) and for 
WSIS we have L = 9{Cwsis{'<P{X))/ ^). 

Example Consider the automaton A depicted in Figure 1(a) that accepts the 
language 1(01)* = {1, 101, 10101, . . . }. This language is defined by the formula^ 

alternate yi- 2 j^{X) = 3n. -■dp. s{n,p) A (1) 

AT(0)AX(n)A (2) 

'ip.p<n^ 3q. s(p, q) A {X{q) O ~^X{p)) (3) 

interpreted in M2 L-Str. (1) formalizes that n denotes (the last position) k of 
Definition 1. (2) states that the first and last positions are in X, and, by (3), the 
positions in X alternate. Observe that if we existentially quantify the variable 
X in alternateM 2 e{X), then we obtain a closed formula that is neither valid nor 
unsatisfiable; its corresponding automaton, given in Figure 1(b), is the same as 
A except its transitions are labeled with () G B°. 

For WSIS we can define the same language with the formula 

dcf 

alternate ^f!sis{X) = 3n. (Vp. n < p ^ ~<X{p)) A (2) A (3) . 

The only difference is that to state that n is the last position we require that 
X contains no positions greater than n. The language Cwsis{Aternatewsis{X)) 

The less-than relation < is definable in M2 L-Str, WSIS, and SIS (introduced in 
§4.3). 



2 
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is 1(01)*0* and £wsis(V'(-^))/ ~ is 1(01)*. In contrast to M2 L-Str, if we exi- 
stentially quantify the variable X in alternate v/sis{X), then we obtain a valid 
formula and its automaton is depicted in Figure 1(c). 



4 Bounded Model Construction 

In this section we present bounded model construction, which can generate 
counter-examples for non-theorems nonelementary faster than its automata- 
theoretic counterpart. We show this for M2 L-Str and give negative results, 
showing the impossibility of such procedures, for other monadic logics. 

The problem we analyze is how to generate counter-examples of a given size 
and do this quickly (elementary!) with respect to the size parameter. We express 
this in the format of a parameterized complexity problem (cf. [1]). For L either 
M2 L-Str or WSIS, we define: 

Definition 3 

Bounded Model Construction for L fBMC(L ) ) 

Instance: A formula 4> and a natural number k. 

Parameter: k. 

Question: Does </> have a satisfying word model of length k with respect to L? 
(That is, is there a word w of length k with w ^|_ 4>?) 

4.1 Bounded Model Construction for M2L-Str 

We proceed by defining a family of functions ([-IfelfegN transforms MSO- 
formulae into quantified Boolean formulae such that there is word model of 
length k for (f> iS \ (f] k is satisfiable. The size of the resulting formula is polynomial 
in the size of (j) and k. 

To simplify matters, we reduce MSO to a minimal kernel, called MSOq, 
which is as expressive as MSO. The language MSOq has the grammar: 

(j) ::= Succ(X, Y) \ X <ZY \ \ \ 3X. <(, X, F G V2 . 

Succ(X, F) means that X and F are singletons {p} and {g}, where q = p + 1. 
The symbol C denotes the subset relation. Note that first-order variables are 
omitted and can be encoded as singletons. There is a simple polynomial time 
translation from MSO formulae into MSOq [20]. 



Translation to QBL Let fc G N be fixed. We now describe how to calculate 
the QBL formula \(j)]k for a MSOo-formula </>. The idea is simple: a set M C [k] 
can be represented by k Boolean variables xq, ■ ■ ■ , Xk-i such that Xi = 1 iff 
i G M . Building on this, we encode relations between finite sets and formulae 
over these relations. 
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Let Vo = {x* \ i,j G N} be a set of Boolean variables and singleton be the 
proposition 

singleton(xo, . . . ,Xfc_i) \J (x* A f\ ~^Xj) . 

0<i<k-l 0<j<k-l 

jA* 

The mapping [.] ^ is inductively defined as follows: 

Definition 4 (Translation) 

\Xjn C Xn\k — Ao<i<fc-l(^r* 

[Succ(X^, AT„)]fc = singleton(x^, . . . , A singleton (x((, .. . A 

Vo<i<fc-l(^i" ^r+l) 

\(j)iv 4>2^k = V i’</>2]fc 

\^4'^k =“'['/>!?= 

\3Xm-4']k = ['/’Ifc 

Definition 5 For a substitution cr : V 2 — >■ we define the Boolean substi- 
tution (T : Vo — >■ B, by = 1 iff i G ct(X^). 

Lemma 1 Let a be a substitution and k gN. Then \=m 2 l 4> ’>‘ff^ I=qbl \f>\k- 

Proof. By induction on the construction of fi. 

We first establish the claim for atomic formulae. To begin with, ^m 2 l Xm 
C Xn iff for all z, 0 < i < fc — 1, i G a{Xm) implies that i G a{Xn), which is equi- 
valent to a ^QBL Ao<i<fc-i^r Similarly, if a'" ^m 2 l Succ(A:m, 3f„), then 

criXm) and cr(AT„) are singletons. Moreover, a{Xm) contains a natural number p, 
with 0 <p < k—1, whose successor p-l-1 is in cr(A'„). Hence there is some i, where 
0 < z < fc - 1, such that a |=qbl xf^ so a ^qbl Vo<i<fc-i 

the converse is argued similarly. 

In the inductive step we consider only the case where (f> is of the form 3Xm- fi’ 
as the remaining cases are straightforward. By Definition 1, ^m 2 l 3Xm- A iff 
there is some set M C [k] such that {a[M/Xm])^ |=m 2 l A- From the induction 
hypothesis, (cr[M/X„])'= ^m 2 l iS S [=qbl Mfe. where 6 = a[M/X^]. Note 
that S = aSbQjxffi , . . . , where 6^ = 1 iff z G M, for 0 < z < A: — 1. 

Further, d |=qbl ^ QBL 3x^, . . . \ifi]k- Thus |=m 2 l 3Xm-tp^S 

CT hoBL 3xS", . . . [z/'lfc. □ 

Observe that given a Boolean substitution t, it is trivial to define a MSO 
substitution a where a = t, namely by stipulating that a{Xi) = {j \ t{xj) = 1}. 
Hence, from the above Lemma we can conclude: 

Theorem 1 (Correctness) Let (j) be a MSO formula. For k G N, there exists 
a MSO substitution a where \=m 2 l 4> iff there exists a Boolean substitution t 
where t \=qbl Ifflk- Moreover, (j) is valid in M2L-Str iff for all k>0, the QBL 
formula \<f\k is valid. 
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We define the size of a formula (in any of the logics we consider) as the 
number of symbols occurring in its string representation. Exploiting the fact 
that satisfiability for QBE is PSPACE complete, we prove: 

Theorem 2 (Complexity) BMC(M2L-Str) is PS PACE-complete. 

Proof. Let 4> and fc be a problem instance. The size of \<t)\k is 0{k^\4)\). It follows 
that BMC(M2 L-Str) can be reduced in polynomial time to satisfiability in 
QBL, which establishes membership in PSPACE. 

To prove PSPACE-hardness, we show that satisfiability for QBL can be re- 
duced in log-space to BMC(M2 L-Str). Let A be a fresh second-order variable 

dsf 

and empty be the M2 L-Str proposition defined by empty(A) = VT. X C Y. 
We encode each Boolean variable x with a M2 L-Str variable X. For a QBL 
formula </>, let 4> be the M2 L-Str formula obtained from <j) as follows: replace 
occurrences of Boolean variables x hy X C E, and replace the Boolean quan- 
tifiers as well as the propositional connectives by the corresponding quantifiers 
and connectives of M2 L-Str. Now, the encoding of </> in M2 L-Str is the for- 
mula 3E. empty(E) A (j). For example, the QBL formula Vx 3y. xV y is encoded 
as 3E. empty(E) A VA. 3Y. X C E V Y C E. Under this encoding it is only re- 
levant whether or not a second-order variable is interpreted by the empty set. 
We immediately conclude that a QBL formula is satisfiable iff its encoding has 
a word model of length 1 . □ 



4.2 Bounded Model Construction for WSIS 

The previously given translation cannot be employed for WSIS. If </> is the 
formula 3Am.VA„.A„ C X^, the translation yields the quantified Boolean 
formula 3x™,... , Vxq , . . . ,x’^_^. Ao<i<k-i^? which is valid for 

every k, whereas </> is unsatisfiable in WSIS. We now prove that there is no 
translation that will yield an elementary bounded model construction procedure. 

Theorem 3 BMC(WSIS) is nonelementary. 

Proof. For a closed formula (j), a ^wsis 4> iff Hwsis 4> for all substitutions cr and 
cr', i.e., the satisfiability of a closed formula does not depend on the substitution. 
Hence, every closed WSIS formula is either valid or unsatisfiable. Equivalently, 
for (phe a, closed formula, either £wsis(</>) = ()* or £wsis(<(') = 0- Consequently, 
if a closed formula (p has a word model, then £wsis(<(') = ()* and therefore <p is 
valid. In other words, computing a word model of any length for (p is equivalent 
to checking (p’s validity. □ 

This proof can be easily adapted for other monadic logics. For example, 
WFO[<] is the first-order fragment of WSIS augmented by the relation <. 
Meyer showed in [16] that this logic is nonelementary; this result, combined 
with the above argument, shows that bounded model construction for this logic 
(BMC(WFO[<])) is also nonelement ary. 




108 A. Ayari and D. Basin 



The reader may wonder what causes these differences. We can gain some 
insight by comparing semantics. From the semantics of M2L-Str, (j){X) has 
a word model of length k iff 3X. 4>{X) has a word model of length k. This 
semantic property was employed in the proof of Lemma 1, where in order to 
use the induction hypothesis we require that the witness set M is a subset 
of [k]. Unfortunately, this property fails for WSIS. As can be seen in Figure 
1, existential quantification can change the size of the minimal word model in 
WSIS. A more dramatic example is the family of formulae (written here with 

sugared syntax) <f>n{X) = X{n), for n G N. The minimal length word model for 
4>n{X) is n, whereas it is 0 for 3X. (f>n{X). In general, to determine if a formula 
has a small, e.g., length 0, word model, we must consider word models for their 
subformulae that are non-element ary larger in the worst case. 



4.3 Bounded Model Construction for SIS 

Here we consider monadic logics over infinite words. We start with the logic SIS, 
which is closely related to WSIS and differs only by allowing infinite subsets 
of N as interpretations for second-order variables. A substitution in SIS can be 
encoded as an infinite word and Biichi showed in [6] that SIS exactly captures 
the w-regular languages. In doing so, he provided an effective nonelementary 
transformation of SIS formulae into Biichi automata. 

Here we prove a negative result analogous to the previous one: there is no 
elementary BMC procedure for SIS. This problem must first be properly de- 
fined, since bounded model construction is defined above for finite words, and 
here there are only infinite word models. 

We begin with some basic definitions and results for w-regular languages [20] . 

From the definition of Biichi acceptance, every nonempty w-regular language 

d&f 

contains an infinite word w = uvv . . . , for u and v finite words in S* . If uv is 
of length k, we say the word w contains (or is) a lasso of length k, consisting of 
a prefix u and a loop v. Consequently, a SIS formula 4> is satisfiable iff it has a 
satisfying word model that is a lasso of length k, for some k G N. 

Using the fact that lassos can be finitely represented, we define an analog of 
BMC for SIS. 

Definition 6 

Bounded Model Construction for SIS fBMC(SlS ) ) 

Instance: A formula <j) and a natural number k. 

Parameter: k. 

Question: Does (f have a satisfying lasso of length k in SIS? 

We now prove that no elementary BMC procedure for SIS exists. 

Theorem 4 BMC(SIS) is nonelementary. 

Proof. Our proof uses an embedding of WSIS in SIS. For this, we first 
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show that finiteness is definable in SIS. Let finite and Finite be the following 
two propositions: 

finite(X) 3m.^p. X{p) p<m and Finite((/)) j\^ finite(X) . 

X ^freev ars {(f)) 



We define an embedding function [•] from WSIS into SIS by [(f)] = Finite(^) A [(/)], 
where \<f\ is: 

l'X(t)] = X{t) \^P-4>] = \(j)] 

T'/’i V 02l = T'/’il V [^ 2 ! = 3X. finite(X) A [(/)] 

We can show by induction over the structure of formulae that [•] preserves 
satisfiability. Namely, a formula <f) has a word model of length k in WSIS iff [<f>] 
has a satisfying lasso of length k in SIS. 

The function that assigns to each BMC(WSlS)-instance {(f>,k) the 
BMC(SlS)-instance {[(f)], k) reduces, in polynomial time, BMC(WSIS) to 
BMC(SIS). Using Theorem 3, the claim follows. □ 

Again, we can apply the same proof idea to other monadic logics. Let FO[<] 
be the first-order fragment of SIS augmented by the less relation <. A similar 
proof establishes that bounded model construction for FO[<], i.e., 

BMC(FO[<]), is nonelementary by reducing BMC(WFO[<]) to 
BMC(FO[<]). 

5 Experimental Results 

We have implemented bounded model construction for M2L-Str and describe 
here experimental results. Our system takes as input a natural number and a 
formula written in a “sugared” version of the syntax of §3.1. It first calculates 
from the inputs a QBL formula as described in §4.1. Second, it transforms the 
QBL formula into Boolean logic by eliminating the universal quantifiers (repla- 
cing 'ix.(j) with (j)[tru&/x\ A (()[false/x]) and dropping the remaining existential 
quantifiers (assuming all bound variables are uniquely named). Third, the re- 
sulting Boolean formula is converted into conjunctive normal form. Finally, the 
result is tested for satisfiability using the Sato system [23], which is an efficient 
implementation of the Davis-Putnam procedure. Of course, with minor changes 
other satisfiability checkers could be used. 

We used the Mona system [14] for comparison. Mona is an automat a-based 
implementation of decision procedures for the monadic logics M2L-Str, WSIS 
and their generalizations to trees. Mona compiles a formula into a minimal de- 
terministic automaton, which it represents and manipulates using BDD’s. Over 
the last few years Mona has been continually improved and is now highly opti- 
mized (we use version 1.4). For our tests, we used a 450 MHZ Sun Sparc Ultra 
workstation. 
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For completeness, we tested examples ranging from those that are easy for 
Mona to those that are difficult. Table 1 presents tests on several easy examples. 
In all of these, we find counter-examples (of the same length as Mona’s) when 
they exist. However, for these examples, Mona is much faster. 

The first example is a parameterized n-hit ripple-carry adder, taken from 

[2] . The input formula states the equivalence between a structural description of 
the parameterized adder family (described at the gate level) with a behavioral 
description, describing how bit-strings are added. We checked this equivalence 
for A: = 2, 4, and 6. The second example involves a structural specification 
of a sequential D-type flip-flop circuit, and its behavioral model. The circuit 
is built from 6 nand-gates, each of which has a (unit) time-delay. We tested 
the correctness of this circuit with respect to a behavioral description proposed 
by Gordon in [10]. As has been discovered by [2,22], the specification has a 
subtle bug. Both Mona and our system find a (different) counter-example of 
length 8. The third example is a buggy mutual exclusion protocol taken from 

[3] ; both systems successfully find a trace showing that the critical sections can 
be simultaneously accessed. 

Next we consider some examples that are difficult for Mona. First, we con- 
sider reasoning about two concurrent processes that increment a shared integer 
variable N by each executing the program: Load Reg N, Add Reg 1, Store Reg 
N. If we assume an interleaving semantics, it is possible that N is incremented 
by either 1 or 2. We model the two parallel processes in M2 L-Str and assert 
(incorrectly) that after execution N is incremented by 1. Table 4 gives the re- 
sults, where we scale the problem by considering registers of different bit-width. 
For more than 4 bits, Mona runs out of memory as the automata accepting the 
computations (traces) of the two systems grow exponentially. 

Finally, we consider two sequential circuits: a counter and a barrel shifter, 
which we parameterize in the width of the data-path. Tables 3 and 2 give the 
results of these experiments for data-paths of various widths. In the first exam- 
ple, the n-bit counter has two selection lines and n data lines. At each point in 
time the value of the data lines is incremented, reset or unchanged depending 
on the value of the selection lines. We verify this with respect to an incorrect 
specification, which asserts that, after eight time units, the data line is always 
incremented. In our experiments, Mona quickly runs into state explosion pro- 
blems, whereas even for large data-paths, we can still generate counter-examples 
quickly. Our procedure finds that, for data-paths between 4 and 40, the short 
counter-examples have length k = 8. The results for the barrel shifter are similar. 



6 Conclusion and Future Work 

We have explored the possibility of providing more efficient alternatives to 
counter-example generation than using standard automat a-theoretic decision 
procedures. We have obtained positive results, both in theory and practice, for 
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Table 1. Simple Examples 



Table 2. Barrel Shifter 
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Table 3. Counter 



Table 4. Parallel Instruction 



M2L-Str, and negative results for WSIS, SIS, WFO[<], and FO[<]. Hence, 
at least for counter-example generation, M2L-Str is the superior choice. 

The theoretical issues seem clear-cut. On the experimental side there is still 
work to do. In our experiments, most of the time is spent translating QBL 
formulae into Boolean logic formulae and the resulting normal form calculation. 
Recent work on testing satisfiability for QBL [7] could offer improvements here; 
investigating this remains as future work. 
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Abstract. Given a Free BDD for the characteristic function of an input-output 
relation T{x, y), we show how to construct a comhinational logic circuit satis- 
fying that relation. Such relations occur as environmental constraints for module 
specifications, as parts of a proof strategies, or can be computed from existing 
circuits, e.g., by formal analysis of combinational cycles. The resulting circuit C 
can be used for further analysis, e.g. symbolic simulation, or to reformat a circuit 
as a logic optimization tactic. 

The constructed circuit includes supplementary parametric inputs to allow all legal 
outputs to be generated in the case that T is non-deterministic. The structure of 
the circuit is isomorphic to that of the BDD for T, and hence is as compact as the 
BDD. In particular, when T represents a relation between bit vector integer values 
definable in Presburger arithmetic, the constructed circuit will have a regular bit 
slice form. 



1 Introduction 

A general Boolean relation T{x, y) admits multiple interpretations and representations 
arising in various contexts. We consider the case when the x variables are considered 
to he inputs presented to some system component, and y are output variables that the 
system component generates, subject to the constraint that the given x and the generated 
y must satisfy the T relation. The problem we address is to construct a combinational 
circuit satisfying the input-output relation represented by a Free Binary Decision Dia- 
gram (FBDD). Since an FBDD is a generalization of the more common Ordered BDD 
(OBDD), such a construction will also work for OBDDs. 

The primary context we consider is that of a verification constraint or precondition. 
In this case, the input variables x encode the state of some module under verification, 
and the output variables y provide the input stimulus to that module. The set of stimuli 
to be presented to the module, in general or for a particular verification task, may depend 
on the current state of the module. For example, Yuan et al. [19] describe a verification 
methodology where a module’s environment is specified as a constraint which can depend 
on the state of the module. Jain and Gopalakrishnan’s methodology [12] also includes 
“action” constraints which depend on the system state. These constraints are used as a 
verification tactic rather than a specification of the module’s environment. Aagaard et 
al. [1] also use tactical constraints, but their constraints do not depend on the state of the 
module and thus are a special case of the more general one we consider. 

Input-output relations may arise in other contexts. For example, such a relation may 
be derived from a combinational logic circuit, summarizing the behavior of the circuit. 
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In such a case the input-output relation will be complete (for each x at least one y 
exists that satisfies T) and deterministic (for each x at most one y exists that satisfies 
T). One can also analyze the context of a subcircuit to construct a nondeterministic 
input-output relation for the allowable behaviors of the subcircuit which will preserve 
the overall behavior of the containing complete circuit [18]. Another use of input-output 
relations is in the analysis of cyclic circuits built of combinational gates. Such circuits 
may reliably settle their output values to a deterministic function of their inputs despite 
their cyclic topology. The constructivity analyses of Shiple [17] and Namjoshi et al. [15] 
generate Boolean relations (output bit by output bit in Shiple’s analysis) that represent 
the combinational function of cyclic circuits, along with checking whether those cyclic 
circuits are indeed not state holding. 

Input-output relations are also used as a means to express the intended function of 
a machine being designed in high level languages such as SMV [14]. One advantage of 
using relations for design is the natural representation of non-determinism. 

A Boolean relation T{x, y) may be represented in various ways. A OBDD [8] can 
be used to represent the characteristic function of the relation. A generalization of the 
OBDD representation is the FBDD [9], where variables may occur in different orders 
on different paths from root to terminal. Both FBDDs and OBDDs are constrained so 
that variables occur at most once on any path. The added flexibility of FBDDs permits 
a much more compact representation for some relations. 

Another possible representation is as a multiple output combinational logic circuit 
with inputs x and outputs y. If the relation T is complete and deterministic then the 
required values of the outputs are well defined, and can be expressed as the positive 
cofactors of the bitwise characteristic functions: 



{^j^iyj-T{x,y)) \y^ 

In the general case, a given value of x might be related to multiple y values or to 
none. One can supplement a circuit with two additional features to allow it to accurately 
represent a general input-output relation T{x,y).To handle incompleteness, one can add 
an extra output v{x) = 3y.T{x, y) to the circuit, indicating whether any y exists that 
is related to a given input x. To handle non-determinism one can add extra parametric 
inputs p to the circuit, so that for every output value y that satisfies T{x, y) for a given 
input X, there is some value of p such that the circuit will generate y when applied to 
the inputs {x,p). 

A multiple output combinational logic circuit provides a broadly applicable repre- 
sention for T. As Jain and Gopalakrishnan [12], Aagaard et al. [1], and Bertacco et 
al. [3] point out, symbolic simulation is a powerful technique for exploring the behavior 
of a circuit. Symbolic simulation can be directly applied to the combinational circuit 
representation of T. Other state exploration engines such as those based on SAT [4] 
or ATPG [5] generally accept combinational logic circuits as a problem representation. 
Logic emulation hardware is another state exploration mechanism for which a combi- 
national logic circuit is an ideal problem representation. 

When T represents a constraint on the inputs to some module under verihcation, 
the outputs of the circuit we construct for T would be connnected to the inputs of the 
module, and the composite circuit submitted to state exploration. While the circuit we 
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construct does not reduce the number of input variables compared to the unconstrained 
circuit, the constraints on the inputs can prevent false error reports that could have 
occurred if improper input stimuli were allowed to propagate into the module [19]. 
The constraints may also improve the efficiency of state exploration techniques such as 
symbolic simulation by reducing BDD sizes [1]. 

A combinational logic circuit also provides a structure for implementation in digital 
hardware. In this case, the relation to be implemented should be complete, so the v 
output should be constant 1 and can be ignored. An implementation will also generally 
be deterministic. If the input-output relation to be implemented is non-deterministic, the 
supplementary inputs p in the non-deterministic circuit representation can be connected 
to arbitrary constants or variable signals to form a deterministic circuit. 

Our contribution is an elegant translation procedure that constructs a combinational 
logic circuit, with inputs x and p and outputs y and v, from a general input-output 
relation T{x, y) represented as the Free BDD of its characteristic function. The size of 
the circuit is proportional to the number of nodes in the Free BDD. When the input- 
output relation is non-deterministic, the supplementary parametric inputs p are used to 
index all the output y values related to a given input x. 

In the remainder of this paper, we will first discuss the details of the construction 
procedure. We will then demonstrate that the circuit constructed does effectively repre- 
sent the input-output relation. Next we address the compactness of the circuit. Finally 
we review related work and conclude. 



2 Circuit Construction Procedure 

Given an FBDD for an input-output relation T{x,y),we construct a circuit implementing 
T that has the same top level topology as the FBDD. First we describe the high level 
structure and signal flow of the circuit. We will then discuss the internal details of each 
of the modules that compose the circuit. 

2.1 High Level Signal Flow 

For every node in the BDD for T there is an instantiation of a basic module. The basic 
modules come in two types, corresponding to the two classes of variables that occur in 
the BDD. There is an input module that is used in place of BDD nodes labeled by input 
variables, and an output module used in place of nodes labeled by output variables. The 
connections between modules are created to match the edges between the corresponding 
BDD nodes. 

We describe the construction process in terms of an example. Suppose we are given 
the BDD shown in Figure 1 . The circuit shown in Figure 2 provides a combinational 
logic representation for that relation. The inputs to the circuit are at the bottom of the 
figure, with two main input variables x_l and x_2, and two supplementary parametric 
inputs p_l and p_2. There are many possible ways to parameterize or encode the y in 
terms of a set of parameters p. In our encoding we use one parameter input bit for each 
output bit. The outputs are at the top of the figure, with the two main outputs y_l and 
y_2 and the supplementary output v. 
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Just as there are seven nodes in the BDD, there are seven major modules in the 
circuit. In this top level diagram, each connection point on the modules is marked with 
an arrow to indicate the direction of signal flow. Disconnected inputs are driven by logic 
value 0. Each edge in the BDD corresponds to two wires in the circuit, one that flows 
from terminal to root and one that flows from root to terminal. 

The overall logic flow in the circuit can be broken into three phases. The first phase 
flows from the terminal nodes to the root node, the second from the root node back to the 
terminal nodes, and the third phase across all the nodes labelled by the same variable. 

In the first phase, constant 0 and 1 values corresponding to the terminal nodes start to 
flow toward the root, combining with the circuit inputs x along the way. This terminal-to- 
root flow results in the v signal which is the auxiliary circuit output, indicating whether 
any valid circuit outputs are possible for the particular values being presented at the inputs 
X. In this first phase, each node will receive signals from the modules corresponding to 
the destination nodes of its two outgoing edges, indicating whether those two nodes have 
any path to the 1 terminal consistent with the presented values of the primary circuit 
inputs. 

In the second phase, signals propagate from the root to the terminals to activate a 
single path from root to terminal. This path is steered by the primary circuit inputs and 
also by the auxiliary inputs. Each node receives a signal that indicates whether any of its 
incoming edges are active. If an incoming edge is active, then the current node is active 
and must choose which outgoing edge to activate. If the node is labeled by an input 
variable, then the node is constrained to choose as directed by the value of that variable. 
If the node is labeled by an output variable, then the node will choose as directed by 
the corresponding auxiliary input variable if possible. During the first phase of signal 
propagation the node received signals from the two destination nodes which indicated 
which of them had possible paths to the 1 terminal. The node can then use these signal 
values to be sure to choose a valid value for the primary circuit output signal, one that 
can form part of an unbroken path from the root of the BDD to the 1 terminal. 

In the third phase, the value of the circuit outputs y are computed, based on the 
activated path. If the activated path includes a node labelled by a particular output bit yi, 
then the edge of that node followed by the path will fix the value of the output bit. If the 
activated path does not include j/j, then the value of the corresponding parametric input 
Pi is used. The modules substituted for the nodes labelled by a single output variable yt 
are connected together in a serial chain. There are two logic signals propagated along 
this chain. The order of the nodes in the chain is arbitrary. This chain gathers information 
to compute the value of yi . Eor each output variable we also include a single multiplexor 
to handle the case where the activated path does not include a node labelled by that 
variable. So in this circuit there are two multiplexors, shown near the top of Figure 2, 
corresponding to the two output variables y_l and y_2. If the BDD for an input-output 
relation doesn’t include any nodes at all for a particular output variable, then the value of 
that output variable is not constrained by the inputs, and can simply be copied directly 
from the corresponding supplementary parameter variable. In this case no multiplexor 
is needed. 

Another implementation detail arises in handling multiple edges leading to the same 
destination node. In the circuit we build, this is translated, in part, to a collection of 
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signals whose disjunction drives the module corresponding to the destination node. We 
implement this here with a chain of OR gates, one in each of the modules corresponding 
to the sources of the edges which all have the same destination node. 

2.2 Input Module 




find_out 

choose_in 
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find_in_0 

choose_out_chain_0 

choose_out_0 



find_in_l 

choose_out_chain_ 1 
choose_out_l 
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Fig. 3. Module for Input Node 



Figure 3 shows the internal details of the module to he substituted for each BDD node 
labelled with a circuit primary input Xi. The upper part of the circuit is a multiplexor 
whose data inputs are the signals from the two outgoing edge destination modules. The 
multiplexor control is the signal xt. If there is a compatible path to the 1 terminal along 
the edge labeled by the present value of Xi, then there is a compatible path from this 
node to the 1 terminal. 

The lower part of the circuit steers the active path in the second phase. If this node 
is marked as active by one of its incoming edges, then it activates one of its destination 
nodes as chosen by the value of Xi. The OR gates at the outputs work with the other nodes 
that have edges to the same destination, so that if any of these source nodes activate their 
edges to that destination, then the activation will reach that node. 

2.3 Output Module 

Figure 4 shows the module to substitute for a BDD node labelled by a primary output 
variable yi. Again the upper part is for the first phase of propagation and the lower part 
for the second phase. In the first phase the circuit computes the paths through the BDD 
allowed by the current values of the inputs x before the output values are picked. There 
is a path to the 1 terminal through this output node compatible with the x inputs if there 
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is a path from either of its destination nodes. So a simple OR gate is enough for the first 
phase. 

In the second phase, if an incoming edge to the module is activated then it must 
choose an outgoing edge to activate. The vaLin signal to the module is driven in the top 
level circuit by the parametric input pi . If both outgoing edges indicate the existence of 
paths to the 1 terminal, then the edge suggested by pi will be activated. If only one edge 
has a path to the 1 terminal, then that edge will be activated and the pi value is ignored. 

The value for yt will be computed in the third phase of computation in the circuit. 
The OR gate at the top right of the diagram works together with all the other nodes 
labeled by yi . If any of these nodes has been activated and has chosen the 1 value, then 
the output should be driven to 1 . Otherwise the 0 value should be chosen. 

The OR gate at the top left accumulates a value for yi indicating whether any of 
the BDD nodes labelled by yi were activated. This value is then fed to the multiplexor 
which chooses the final value for r/i . If a node was activated, then the value determined 
by that node and passed along through the value chain should be chosen. If no node was 
chosen, this indicates that an edge was activated that skipped over the yi. In this case its 
value is unconstrained by the present values of the x inputs, and the value of pi should 
be chosen. 



3 Circuit Correctness 

Let Tc{x, p, y, v) denote the input-output relation of the circuit constructed from the 
Boolean relation T{x^ y). In this section we prove that Tq correctly implements T. We 
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Start with two lemmas, whose proofs use induction based on a topological ordering of 
the nodes of the FBDD. Each node n in an FBDD represents some Boolean function 
fn{x,y). 

Lemma 1. For a given x, the “find-Out” signal from a module n has the value 1 iff, for 
some y, fn{x,y) = 1. 

Proof (sketch) : This is clearly true for a node whose edges both lead to terminal nodes. 
If the hnd_in_0 and hnd_in_l edges reflect the existence of a path to the 1 terminal, then 
each module will in turn determine if a path exists flowing through the corresponding 
node. Thus by induction, for all modules find.out reflects the existence of a path. ■ 
Since the v output is given by the find_out signal of the module corresponding to the 
root node, this shows the value of v for a given x is 3y.T{x, y). 

Lemma 2. The “choose Jn” signals will activate a single path through the BDD from 
the root node to the 1 terminal, if any such path exists for the given input x. 

Proof (sketch): A topological order of the BDD nodes determines a series of cuts 
which partition the nodes into a root set and a terminal set, where any BDD edge that 
crosses the cut will be directed from a node in the root set to a node in the terminal set. 
We can prove inductively that the number of activated edges crossing any cut is exactly 
1 if 3y.T{x, y), and 0 otherwise. The induction starts with the base case of the cut with 
all the BDD nodes on the terminal side, where the root edge coming into the root node 
is activated just in case 3y.T{x, y) is true. Now we assume that the i’th cut has a single 
activated edge crossing it, and show that the i + I’th cut will also be crossed by a single 
activated edge. Each node will activate a single outgoing edge if the incoming edge is 
activated, or neither outgoing edge if the incoming edge is not activated. Thus each node 
n preserves the number of activated edges crossing the cuts just before and after n. ■ 
With these basic properties of the circuit established, we can now prove the correc- 
tness of Tc- 

Theorem 1. The input-output relation Tc(x, p, y, v) for the circuit built from the rela- 
tion T{x,y) satifies: 

T{x,y) => Tc{x,y,y,l) 

Tc{x,p,y,l) T{x,y) 

Proof (sketch): Since any path from root to terminal includes at most one node labelled 
by any given output variable yi, then the third phase will propagate to the output yi the 
value computed corresponding to the outgoing edge from that node (or the corresponding 
Pi if no such node is included). Thus 

Tc{x,p,y, 1) ^ T{x,y) 

Since each output module attempts to steer the path to follow the choices suggested 
by the parametric input p, the path activated will drive the outputs to p if T(a;, y) holds. 
This together with the value of v shows that 

T{x,y) Tc{x,y,y, 1) 
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4 Circuit Compactness 

Since the circuit constructed by this technique has the same high level topology as 
the BDD of the input-output relation, its size is proportional to the BDD. The circuit 
generated is not at all locally optimal. There are many constant inputs to modules, such 
as the constants corresponding to the BDD terminal nodes and the starting points of the 
various node chains. There are also disconnected outputs of modules, principally at the 
last edges of the second pass root-to-terminal propagation, since the activation signal 
does not have to be propagated to the terminal nodes. These constants and disconnected 
outputs provide straightforward opportunities for simple local logic optimization, but 
other more sophisticated techniques could also be applied. As long as the input-output 
relation’s BDD is reasonably compact, the circuit we construct should provide an efficient 
high level structure and a good starting point for such low level optimization, which could 
be followed by mapping to a specific fechnology if the circuit is to be manufactured as 
digital hardware. 

In order to generate a more efficient circuit, before converting the BDD to a circuit 
one could apply exact [1 1] or approximate [16] variable reordering techniques to attempt 
to reduce the size of the BDD. Since the circuit construction procedure we provide here 
also applies to the Free BDD representation of an input-output relation, one could further 
reduce the size of the circuit by exploiting the freedom to use different variable orderings 
on different branches of the diagram [10]. 

Kukula et al. [13] observe that a relation between radix-encoded integers definable 
in Presburger arithmetic will have a compact, regularly-structured OBDD so long as the 
variable ordering interleaves the bits in the order of the encoding weights. In this case 
the circuit constructed will have a bit slice form, a linear array of repeated instances of 
a single module. Such a circuit is not only efficient in terms of gate count but also lends 
itself to an efficient physical layout. 



5 Related Work 

Brown [7] discusses parametric general solutions for Boolean equations. His method of 
successive elimination will give the same parametric functions as implemented by the 
circuit we construct in the special case of an OBDD. Brown’s methods deal with general 
Boolean functions rather than specific circuit implementations or BDD representations, 
and in particular he does not address the issue of circuit size. 

Our construction technique is most closely related to the stimulus generation al- 
gorithm of Yuan et al. [19] and the parametric constraint representation of Aagaard et 
al. [1]. Yuan et al. present an algorithm which generates random stimuli satisfying a 
constraint, represented as a BDD, which may depend on state variables of the design. 
Our circuit has a flow very similar to their algorithm. The main differences between their 
work and ours are that our technique constructs a compact circuit rather than generating 
a single stimulus instance, and the output value selection of our circuit is controlled by 
parametric inputs, rather than weighted random numbers as in Yuan et al. The circuit 
constructed by our technique can be used by a wide variety of downstream tools such as 
SAT or symbolic simulation. 
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Aagaard et al. present an algorithm which generates a vector of OBDDs over a set 
of parametric input variables; the combinations of values of these BDDs span the space 
of stimulus values which satisfy some constraint. The constraints they deal with do not 
involve any dependence on state variables in the design, and hence their technique is 
limited to unary relations. In contrast, we work with the more general problem of binary 
relations. Another difference is that their algorithm generates a parametric result in the 
form of OBDDs. Some relations will not admit a tractable parametric representation 
as a vector of OBDDs. Since we map directly from a Free BDD representation of the 
input-output relation to a circuit, our technique can be used with a broader range of 
relations. 

Other related work includes synthesis of multiplexor circuits from BDDs [2]. In that 
work, a multi-rooted BDD defines the vector of output functions to be implemented. Our 
technique differs, working instead with the input-output relation and also in working 
with incomplete and/or non-deterministic functions. Synthesis from the input-output 
relation can result in circuits considerably more compact than those built from the multi- 
rooted functional BDD used in multiplexor synthesis. Consider the arithmetic function 
max(a, b + c), where a, b, and c are n-bit vectors representing integers in the usual 
radix-2 encoding. Each output bit of this function can be represented by a BDD with 
size bound 0(n), by using an interleaved variable ordering. With n output bits to be 
represented, the total shared size of the multi-rooted BDD will be quadratic in n unless 
there is significant node sharing across the multiple outputs. But there is a conflict 
between the variable orders required by the max and addition functions if nodes are 
to be shared. With the low-order bits ordered at the top, the max function will give a 
compact multi-rooted BDD representation, since the high-order bit nodes at the bottom 
of the BDD can be shared by all the low order nodes. However, efficient multi-rooted 
representation of addition requires the low order bits at the bottom to be shared by all 
the high order nodes. Whichever order is chosen, one function or the other will fail to 
share the nodes at the bottom of the BDD. Thus the entire multi-rooted BDD will end 
up being quadratic in the bit-width. In contrast to this, the circuit constructed by our 
technique will grow only linearly with the bit-width since the input-output relation is 
definable in Presburger arithmetic. 



6 Conclusion 

We have presented a simple and direct mapping from a Free BDD representing an input- 
output relation T{x, y) to a compact combinational circuit. This mapping supports both 
incomplete and non-deterministic relations by means of a supplementary output signal 
and a set of supplementary parametric inputs. The combinational circuit provides a 
flexible representation which can be used for verification or synthesis. 

The usefulness of the OBDD representation has led to a variety of extensions. We 
have presented our circuit construction technique in terms of the more general Free 
BDD representation. Another common extension is to add various attributes to the BDD 
edges, for example complementation [6]. Edge complementation can reduce BDD size 
by up to a factor of two. It is quite possible to construct a circuit directly from a BDD 
with complemented edges, but the modules required grow by about a factor of two. 
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so there doesn’t appear to be any advantage. Our next steps in this research will be to 
investigate other extensions to OBDDs to see which of them can support effective circuit 
construction techniques. 
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Abstract. In this paper we show how to do symbolic model checking 
using Boolean Expression Diagrams (BEDs), a non-canonical representa- 
tion for Boolean formulas, instead of Binary Decision Diagrams (BDDs), 
the traditionally used canonical representation. The method is based on 
standard fixed point algorithms, combined with BDDs and SAT-solvers 
to perform satisfiability checking. As a result we are able to model check 
systems for which standard BDD-based methods fail. For example, we 
model check a liveness property of a 256 bit shift-and-add multiplier and 
we are able to find a previously undetected bug in the speciheation of a 
16 bit multiplier. As opposed to Bounded Model Checking (BMC) our 
method is complete in practice. 

Our technique is based on a quantification procedure that allows us to 
eliminate quantifiers in Quantified Boolean Formulas (QBE). The basic 
step of this procedure is the up-one operation for BEDs. In addition we 
list a number of important optimizations to reduce the number of basic 
steps. In particular the optimization rule of quantification-by-substitution 
turned out to be very useful: 3a; : g A {x ^ f) = g[f /x]. The rule is used 
(1) during fixed point iterations, (2) for deciding whether an initial set 
of states is a subset of another set of states, and hnally (3) for iterative 
squaring. 
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1 Introduction 

Symbolic model checking has been performed using fixed point iterations for 
quite some time [11]. The key to the success is the canonical Binary Decision 
Diagram (BDD) [8] data structure for representing Boolean functions. However, 
such a representation explodes in size for certain functions. In this paper we 
show how to do symbolic model checking using Boolean Expression Diagrams 
(BEDs) [2,3], a non-canonical representation of Boolean functions. The method 
is theoretically complete as we only change the representation and not the al- 
gorithms. Dropping the canonicity requirement has both advantages and disad- 
vantages: Non-canonical data structures are more succinct than canonical ones 
- sometimes exponentially more. Determining satisfiability of Boolean functions 
is easy with canonical data structures, but with non-canonical data structures 
it is hard. We show how to overcome the disadvantages and exploit some of the 
advantages in symbolic model checking. 

As a non-canonical representation, BEDs do not allow for constant time 
satisfiability checking. Instead we use two different methods for satisfiability 
checking: (1) SAT-solvers like Grasp [15] and Sato [18], and (2) conversion of 
BEDs to BDDs. BDDs are canonical and thus satisfiability checking is a constant 
time operation. We perform symbolic model checking the classical way with fixed 
point iterations. One of the key elements of our method is the quantification-by- 
substitution rule: 3x \ g /\ {x f) = g[f /x\. The rule is used (1) during fixed 
point iterations, (2) while deciding whether an initial set of states is a subset of 
another set of states, and finally (3) while doing iterative squaring. 

While complete in the sense that it handles full CTL [13] model checking, our 
method performs best if the system has few inputs and the transition relation 
can be written as a conjunction of next-state functions. The reason is that this 
allows us to fully exploit the quantification-by-substitution rule. 

Using our method, we can model check a liveness property of a 256 bit shift- 
and-add multiplier, which requires 256 iterations to reach the fixed point. This 
should be compared with the 23 bit multipliers that standard BDD methods can 
handle. In fact, we are able to detect a previously unknown bug in the specifica- 
tion of a 16 bit multiplier. It was generally thought that iterative squaring was 
of no use in model checking. However, we show that iterative squaring enables 
us to calculate the reachable set of states for all 32 outputs of a 16 bit multiplier 
faster than without iterative squaring. 

Model checking was invented by Clarke, Emerson, and Sistla in the 1980s [13]. 
Their model checking method required an explicit enumeration of states which 
limited the size of the systems they could handle. Burch et al. [11] showed how 
to do model checking without enumerating the states. They called this symbolic 
model checking. The idea is to represent sets of states by characteristic functions. 
The data structure of Binary Decision Diagrams turns out to be a very efficient 
representation for characteristic functions. The advantages of BDDs are com- 
pactness, canonicity, and ease of manipulation. Since the appearance of BDDs, 
many other related data structures have been proposed. Bryant gives an over- 
view in [9]. One such data structure is the Boolean Expression Diagram. It is a 
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generalization of BDDs. In this paper we will study BEDs for use in symbolic 
model checking. 

Biere, Clarke et al. have proposed Bounded Model Checking (BMC) as an 
alternative method to BDD-based model checking [4,5,6]. They unfold the tran- 
sition relation and look for repeatedly longer and longer counterexamples, and 
they use SAT-solvers instead of BDDs. BMC is good at finding errors with 
short counterexamples. The diameter of the system determines the number of 
unfoldings of the transition relation that are necessary in order to prove the cor- 
rectness of the circuit. Unfortunately, for many examples the diameter cannot be 
calculated and the estimates are too rough. In such cases BMC reduces to a par- 
tial verification method in practice. Our method does not need the computation 
of the diameter or approximations of it. 

The work most closely related to ours is by Abdulla, Bjesse and Een. They 
consider symbolic reachability analysis using SAT-solvers [1]. For representing 
Boolean functions they use the Reduced Boolean Circuit data structure which 
closely resembles our Boolean Expression Diagrams. They perform reachability 
analysis using a fixed point iteration. Both of us make use of the quantification- 
hy-suhstitution rule. They use Stalmarck’s patented method [17] to determine 
satisfiability of Boolean functions. While related, their method and ours differ 
in a number of ways: In essence, the basic step in their and our quantification 
algorithm can be computed by the up-one [2,3] BED-algorithm. Therefore we 
think BEDs are the most natural representation in this context. We handle 
full CTL while they concentrate on reachability (their tool does handle full 
CTL, but they have only reported reachability results so far). In our method the 
quantification-by-substitution rule is extensively used at three different places and 
not just during fixed point calculation. We have heuristics for choosing different 
SAT procedures depending on the expected result of the satisfiability check. 
Candidates are various SAT-solvers or an explicit BED to BDD conversion. We 
use SAT-solvers if the formula is expected to be satisfiable and either SAT- 
solvers or an explicit BED to BDD conversion if the formula is expected to be 
unsatisfiable. In their work they only use SAT-solvers. BEDs are always locally 
reduced and we identify further important simplification rules. Finally we make 
use of iterative squaring. 

This paper is organized as follows. In section 2, we review the BED data 
structure. In section 3, we show how to do model checking using BEDs. In 
section 4, we give three applications of the quantification-by-substitution rule. In 
section 5, we deal with the size of BEDs. In section 6, we present the experimental 
results. Finally in section 7, we conclude. 

2 Boolean Expression Diagrams 

A Boolean Expression Diagram [2,3] is a data structure for representing and 
manipulating Boolean formulas. In this section we review the data structure. 

Definition 1 (Boolean Expression Diagram). A Boolean Expression Dia- 
gram (BED) is a directed acyclic graph G = (U, E) with vertex set V and edge set 
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E. The vertex set V contains four types of vertices: terminal, variable, operator, 
and quantifier vertices. 

— A terminal vertex v has as attribute a value val{v) G {0, 1}. 

— A variable vertex v has as attributes a Boolean variable var{v), and two 
children low (v), high (v) G V. 

~ An operator vertex v has as attributes a binary Boolean operator op{v), and 
two children low{v), high{v) G V . 

— A quantifier vertex v has as attributes a quantifier quant{v) G {3,V}, a 
Boolean variable var{v), and one child low{v) G V. 

The edge set E is defined by 

E = {(u, low{v)) \ v & V and v has the low attribute } 

U {(u, high{v)) \v&V and v has the high attribute } . 

The relation between a BED and the Boolean function it represents is straight- 
forward. Terminal vertices correspond to the constant functions 0 and 1. Va- 
riable vertices have the same semantics as vertices of BDDs and correspond to 
the if-then-else operator x — >■ /i,/o defined as (x A /i) V {-ix A /o). Operator 
vertices correspond to their respective Boolean connectives. Quantifier vertices 
correspond to the quantification of their associated variable. This leads to the 
following correspondence between BEDs and Boolean functions: 

Definition 2. A vertex v in a BED denotes a Boolean function /*' defined re- 
cursively as: 

— If V is a terminal vertex, then /“ = val{v). 

— If V is a variable vertex, then f" = var{v) — >■ fh%gh{v) ^ jiow{v) _ 

— If V is an operator vertex, then P = op{v) . 

— If V is a quantifier vertex, then f" = quant{v) var{v) : . 

The BED data structure is a representation form for formulas in QBE. If we 
disallow quantifier vertices, we get a representation form for propositional logic. If 
we disallow both operator and quantifier vertices, we get a BDD. As an example. 
Figure 1 shows a BED for the formula V6 : a V (a A 6) a. 

There exist algorithms for transforming a BED into a BDD. One such algo- 
rithm is up-one. It sifts variables one at a time to the root of the BED. Using 
up-one repeatedly to sift all the variables transforms the BED to a BDD. We 
refer the reader to [2,3,14] for a more detailed description of up-one and its 
applications. 



3 Model Checking 

In this section, we review the standard model checking algorithm. The system 
to be verified is represented as a Kripke structure. A Kripke structure M is a 
tuple {S,I,T,£), with a finite set of states S, a set of initial states I C S, a. 
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Fig. 1. The BED for Vfe : aV (aAb) a. All 
edges are directed downwards; the dashed 
edges being the low ones. 



transition relation T C S x S, and a labeling of the states £ : S' — >■ P{A) with 
atomic propositions A. 

A reactive system consists of a set of states and a set of inputs. The states 
are encoded as a Boolean vector of state variables si,...,s„. The inputs are 
also encoded as Boolean variables s„+i , . . . ,Sm- These together form the state 
variables of the Kripke structure, si, . . . , Sm- The atomic propositions correspond 
to the state variables. Each state is assumed to be labeled with the variables 
Si that are 1 for that state. We use primed variables as next state variables, 
unprimed variables as current state variables, and we use characteristic functions 
over the state variables to represent sets. Since the inputs are non-deterministic, 
they are not constrained by the transition relation. Thus, the transition relation 
does not contain the primed versions of the input variables. 

There are two ways to specify a transition relation in an SMV [16] program: 
(a) by use of the “TRANS” statement, and (b) by use of the “ASSIGN” state- 
ment. In (a) one specifies the transition relation directly as a Boolean expression. 
In (b) one specifies next-state functions for state variables. Both methods can 
be used at the same time. We capture this as follows: 

T{s,s') = t{s,s') A /\s',4^ f^{s) (1) 

i 

where s' and s' form a partitioning of . . . , Here, t(s, s') comes from the 
“TRANS” statements and we call it the relational part, while Ai /A®) 
comes from the “ASSIGN” statements and we call it the functional part. (If a 
primed variable is restricted by both “TRANS” and “ASSIGN” statements, we 
place it in the relational part of T.) Our verification method performs best if the 
transition relation is mainly in functional form. 

We use GTL [13] formulas to capture the properties we want to verify. A 
GTL formula characterizes a set of states, namely the set of states satisfying the 
formula. This set can be computed by a fixed point iteration. The central part of 
the fixed point iteration is the computation of relational products. A relational 
product between the transition relation T and a set of states i? is a new set of 
states. In a forward computation, the new set is the set of states reachable in 
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one step from R. We call it the Image of R. In a backward computation, the new 
set is the set of states which in one step can reach a state in R. We call it the 
Preimage of R. 

The following formulas show how to compute the image and preimage of R\ 

Imagcj- ji{s') = 3s : T(s, s') A R{s) 

Prelmagcrp ^(s) = 3s' : T{s, s') A R{s') 

For example, the algorithm in Figure 2 computes the characteristic function for 
the set of states satisfying the CTL formula “AG P” (read: always globally P) 
using backward iteration. It actually computes “-lEF -■P”, i.e., it computes the 
set of states from which there exists a path to a state where P does not hold. 
The complement set then has the property that P holds along all paths. 



AG P = 

Ro <— characteristic function for 
the set of states not satisfying P 

i i 1 

repeat 
i i + 1 

Ri+i RiV Prelmagej, ^. (s) 
until Ri+i ^ Ri 
return -iRi 



Fig. 2. The algorithm for compu- 
ting “AG P” using backward ite- 
ration. T is the transition relation 
for the system. 



A Kripke structure M = (S,I,T,£) satisfies a specification R if and only 
if J is a subset of R. In terms of characteristic functions this translates to the 
implication: I ^ R. 

3.1 Quantification 

The basic step in our quantification algorithm is to eliminate one quantified 
variable by the following rules: 

3x : / = f[Q/x] V /[1/x] Vx : f = f[0/x] A f[l/x] 

Note that this basic step can easily be computed by performing a up-one{f, x) 
BED-operation and then replacing the top level variable vertex by an appropriate 
operator vertex. 

In the worst case, while removing a quantifier from a formula, we double 
the formula size. Since each Image / Preimage computation involves existential 
quantification of all m state variables, we risk increasing the formula size by a 
factor of up to 2*”. In this section we present some syntactical transformations 
which help us to perform the quantifications efficiently. 

The most important transformation is the quantification-by-substitution rule. 
It allows us to replace an existential quantification by a substitution: 

3x: gA{x<^ f) = g[f/x] 



(2) 
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where x does not occur as a free variable in /. 

Our verification method performs best when we can exploit the quantification- 
by-substitution rule. Such cases include systems with few inputs and systems 
with a transition relation that is mainly in functional form. After performing 
quantification-by-substitution, we quantify the remaining state variables (inclu- 
ding inputs) using the rules below. 

By applying scope reduction rules to a formula, we can push quantifiers down 
and thus reduce the potential blowup. The scope reduction rules are the following 
(shown for negation, conjunction and disjunction): 



3x : -<f = -<\/x : f Vx : ~<f = -■3a; : / 

3x : fV g = (3x : f)V {3x : g) Vx : f A g = (\/x : f) A (\/x : g) 

3x : f{y) A g{x) = f{y) A {3x : g{x)) Vx : f{y) V g{x) = f{y) V (Vx : g{x)) 

Because BEDs are always reduced, for details see [2,3,14], the quantifiers 
disappear if they are pushed all the way to the terminals. 

3.2 Satisfiability Checking 

There are two places where we need to determine whether a Boolean formula 
represented by a BED is satisfiable. First we need to detect that a fixed point has 
been reached in the computation of the set of states satisfying a CTL formula. Let 
Ri be the zth approximation to the fixed point. The fixed point has been reached 
if Ri+i = Ri- Using characteristic functions, this translates to Ri+\ Ri. 
However, depending on the CTL operator, the series of approximations will 
either be monotonically increasing or monotonically decreasing. It is therefore 
enough to check set inclusion instead of set equivalence. In the increasing case we 
check if R Ri is a tautology. In the decreasing case we check if Ri is 

a tautology. Until we reach the fixed point, these formulas will not be tautologies. 
In other words, the negation of the formulas will be satisfiable. SAT-solvers are 
good at finding a satisfying variable assignment so we use a SAT-solver here. 

Second we need to determine whether the initial set of states / is a subset of 
the set of states R represented by the CTL specification. In particular we have 
to check I ^ R for tautology. There are two cases: 

— The specification holds. This means that I ^ Ris a, tautology. We could use 
a SAT-solver to prove that the negation of / i? is not satisfiable. However, 
it is our experience that most SAT-solvers are not very good at proving non- 
satisfiability. We can also use BDDs. By using the up- one algorithm, we 
convert the BED for / i? to a BDD. 

— The specification does not hold. A proof will be a variable assignment fal- 
sifying / i?. Or equivalent, a variable assignment satisfying -■(/ R). 

SAT-solvers are good at finding such variable assignments. 

Of course, we do not know before hand whether the specification holds. A pos- 
sibility is to run a SAT-solver and a BED to BDD conversion in parallel. 
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SAT-solvers like Grasp [15] and Sato [18] expect their input to be a pro- 
positional formula in conjunctive normal form (CNF). After the elimination of 
quantifiers, as described in Section 3.1, we still need to convert BEDs into CNF. 
For this conversion we use the well known technique of introducing new variables 
for every non-terminal vertex [4]. 

4 Applications of Quantification-by-Substitution 

4.1 Preimage Computation 

Consider the Preimage computation in section 3. If the transition relation T is 
written as in equation (1), then we can apply rule (2) directly for the functional 
part. This can be done in one traversal of the BED. Figure 3 shows the pseudo- 
code. The algorithm works in a bottom-up way replacing all variables from the 
functional part of T with their next-state function. Line 4 does the replacing 
by performing a Shannon expansion of the variable vertex and inserting the 
next-state function. 



Prelmageiu) = 

1: if u is a terminal then return u 

2: (l,h) {PreImage{low(u)), Prelmage(high{u))) 

3: if u is a variable vertex with variable from the functional part of T then 

4: return {f^ar(-u) A h) V (i fvar{u) A /) 

5: else 

6: return makenode(a(u), Z, h) 



Fig. 3. The algorithm for computing the Preimage of u for the functional part of the 
transition relation: Tfunc = /\. Si The BED u is assumed to be quantifier-free. 

The tag a{u) is short for either var{u) or op{u). 



4.2 Set Inclusion 

We now describe a preprocessing step simplifying I ^ R, i.e., whether the initial 
set of states is a subset of the states characterized by the specification. The initial 
set of states / often has the form: 

/ = /y Si initi{s) 

i 

where initi{s) is the function describing the initial state for the variable Sj. (Note 
that not all variables have an initial state specified.) In many cases initi(s) is 
either a constant or a very simple function, and we can use this fact to simplify 
I ^ R. Let I be written /'A(si initi(s)) and assume initi{s) does not depend 




132 



P.F. Williams et al. 



on variable Si. Recall that / i? is a tautology if and only if Vs^ : / i? is a 
tautology: 

Vs* : / ^ R 

= Vsj : -1 (/' A {si initi{s)) A ^R) 

= ~'3si : I' A (sj initi(s)) A “■-/? 

= -i(/' A -li?) [inzti (s) / Si] 

= ^ i?)[mzti(s)/s*] 

The [mzti(s)/si] means a substitution of initi(s) for Si. This reduces the number 
of variables and often simplifies the formula. 

4.3 Iterative Squaring 

Iterative squaring is a technique for reducing the number of iterations needed to 
reach the fixed point [10]. During reachability analysis we repeatedly square the 
transition relation: 

T2(s, s') = 3s" : T(s, s") A T(s", s') 

Assume that T is written as in equation (1). In general there is no way to square 
T and keep it in this form - the functional part will disappear. However, if we 
restrict ourselves to transition relations purely in functional form, squaring can 
be done easily: 

T^{s, s') = 3s" : T(s, s") A T(s", s') 

= 3s" : 

i 

where [/(s)/s"] is a substitution of function fj{s) for variable s" (for all j). The 
algorithm is similar to the Preimage algorithm in Figure 3. 

In this way we can compute ) in only k steps. ^ is a new transition 
relation representing all paths in T with a length of exactly 2*. However, it is 
not possible to represent in functional form the transition relation allowing paths 
of length up to 2^ . As a consequence we cannot combine this form of iterative 
squaring with, for example, frontier set simplifications. 

Consider the algorithm in Figure 2. To use iterative squaring we simply 
change Prelmagej^j^.^s) to Preimage j, 2 ^ /j.(s)- As a result, Ri represents the set 
of states reachable in up to and including 2* — 1 steps. 

5 BED Simplifications 

As we mentioned in section 3.2, transforming a BED to CNF increases the size 
of the formula as we introduce a new variable for each BED non-terminal vertex. 
It is therefore vital to keep the size of the BEDs small. 
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During the conversion of a BED to a BDD, the size may blow up. Even when 
the final BDD is small (as for a tautology), the intermediate results might be 
large. In this section we describe a method of keeping the BEDs small. 

Keeping the BEDs reduced, as mentioned above, already gives us size reduc- 
tions due to, for example, constant propagation. But we can reduce the size of 
the BEDs even more. This can be achieved by increasing the sharing of vertices 
and by removing local redundancies. In [14] we describe a set of rewriting rules 
in detail. Here we will just mention some of the ideas: 

— Sharing can be increased by disallowing operator vertices which only differ 
in the order of their children; for example a Ab and 6 A a. We fix an ordering 
< of vertices and only create operator vertices with low < high. 

— Size can be reduced by eliminating all negations below binary operators since 
for all binary operators op there exists another operator op' with op' {x, y) = 
ophx,y) 

— Size can be reduced by not using all 16 binary Boolean operators but only a 
subset of them. We use the set nand, or, left implication, right implication, 
and bi-implication. (For clarity, the BED in Figure 1 has not been reduced 
to this subset.) 

— Size can be reduced by exploiting equivalences like the absorption laws, for 
example aV {a Ab) = a, and distributive laws, for example (a A b) V (a A c) = 
a A (bV c). 

We apply all these rewriting rules each time we create a new operator vertex. 
The rules are important for the performance of up-one. 



6 Experimental Results 

We have constructed a prototype implementation of our proposed model checking 
method. It performs CTL model checking on SMV programs. For the experi- 
ments presented here we use Sato as our SAT-solver. It is worth mentioning 
that for some examples Sato completes the tasks in seconds where Grasp ta- 
kes hours. For other examples the reverse is true. We compare our method with 
the NuSMV model checker [12] and with Bwolen Yang’s modified version of 
SMV^, both of which are state-of-the-art in BDD-based model checking. Finally 
we compare reachability results with FixIt from Adbulla, Bjesse, and Een [1]. 

The FixIt results are taken directly from the paper by Abdulla and his 
group^. All other experiments are run on a Linux computer with a Pentium Pro 
200 MHz processor and 1 gigabyte of main memory. 

^ http://www.cs.cmu.edu/~bwolen 

^ From personal correspondence with the authors we have learned that they nsed a 
296 MHz Sun UltraSPARC-II for the barrel shifter experiments and a 333 MHz Snn 
UltraSPARC-IIi for the multiplier experiments. 




134 



P.F. Williams et al. 



6.1 Multiplier 

This example comes from the BMC-l.Of distribution^. It is a 16 x 16 — >■ 32 shift- 
and-add multiplier. The specification is the c6288 combinational multiplier from 
the ISCAS’85 benchmark series [7]. For each output bit we verify that we cannot 
reach a state where the shift-and-add multiplier has finished its computation and 
the output bits of the two multipliers differ. 

The multiplier fits into the category of SMV programs that we handle well. 
The operands are not modeled as inputs. Instead they are modeled as state 
variables with an unspecified initial state and the identity function as the next- 
state function. This lets us use quantification-by-substitution for all but the last 
iteration in the fixed-point calculation. Only in the last iteration do we need to 
quantify the operands out using the standard quantification methods. 

Table 1 shows the runtimes for verifying that the multiplier satisfies the 
specification. Our BED-based method out-performs both NuSMV and Bwolen 



Bit 


BED NuSMV Bwolen 


FixIt 


0 


2.2 


11 


9.4 


2.9 


1 


2.3 


23 


17 


3.1 


2 


2.9 


50 


33 


3.7 


3 


3.8 


130 


71 


4.8 


4 


5.2 


290 


159 


6.6 


5 


7.0 


702 


383 


11 


6 


9.2 


- 


1031 


20 


7 


12 


- 


- 


47 


8 


16 


- 


- 


150 


9 


31 


- 


- 


544 


10 


68 


- 


- 


2078 


11 


352 


- 


- 


8134 


12 


2201 


- 


- 


30330 



Table 1. Runtimes in seconds for 
verifying the correctness of a 16 
bit multiplier. A dash indica- 
tes that the verification could not 
be completed with 800 MB of me- 
mory. 



Yang’s SMV as we are able to model check twice as many outputs as they do. 
FixIt handles the same number of outputs as our method, however, for the more 
difficult outputs, our method is faster by an order of magnitude. 

For the most difficult output in Table 1, the fixed point iteration accounts for 
only a fraction of the total runtime for our method. It takes less than a minute 
and almost no memory to calculate the fixed point. By far the most time is spent 
in proving / i?. SAT-solvers gave poor results, so we converted the BED for 

/ i? to a BDD. The FixIt tool uses a SAT solver to check I => R. We expect 
this is the reason why their runtimes are much longer than ours. However, FixIt 
does not use much memory, while the memory required for the BED to BDD 
conversion is quite large. Of course this is expected since the formulas originate 
from multiplier circuits which are known to be difficult for BDDs. But even 
though we have to revert to BDDs, we still outperform standard BDD-based 
model checkers. 

We did the experiments in Table 1 without use of iterative squaring to enable 
fair comparisons. However, iterative squaring speeds up the fixed point calcu- 

® http : //www. cs . emu. edu/~modelcheck 
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lations. Table 2 shows the runtimes for calculating the fixed points - with and 
without iterative squaring - for the same model checking problem as above. Note 



Bit Without I.S. 


With I.S. 


0 


2.1 


0.9 


5 


6.8 


1.6 


10 


14 


3.7 


15 


16 


8.3 


20 


37 


12 


25 


19 


8.8 


30 


> 12 hours 


6.4 



Table 2. Runtimes in seconds for the fixed point 
calculation in verifying the correctness of the 16 bit 
shift-and-add multiplier. Results are shown for com- 
putations with and without iterative squaring (I.S.). 
The space requirements are small, i.e., less than 16 
MB. 



the case for bit 30 where iterative squaring allows us to calculate the fixed point. 
Without iterative squaring the SAT solver gets stuck. After each iteration the 
SAT solver looks for new states. With iterative squaring many more new states 
are added per iteration making it easier for the SAT solver to find a satisfying 
assignment. 

To see how our method handles erroneous designs, we introduced an error in 
the specification of the multiplier by negating one of the internal nodes (this is 
marked as “bug D” in the multiplier file in the BMC distribution). We observe 
that the fixed points are computed in roughly the same amount of CPU time 
and memory (both with and without iterative squaring). The difference is when 
we prove I ^ R. Using BED to BDD conversion as with the correct design, 
we now get poorer results because / i? is not a tautology and the final 
BDD is not necessarily small. However, using a SAT-solver, we get much better 
results. In many cases, the SAT-solver is able to find a counterexample almost 
immediately. We are able to model check the first 19 outputs as well as some 
of the later outputs of the multiplier using less than 16 MB of memory and one 
minute of CPU time per output. NuSMV and Bwolen Yang’s SMV perform as 
bad as before. 

We were able to find a bug in the “correct” specification of the multiplier 
for the two most significant outputs. Iterative squaring allowed us to quickly 
compute the fixed points, and Sato instantly found the errors. The total run- 
times to find these errors were seven and eight seconds, respectively. It turns 
out that the two outputs have been swapped. The original net-list for c6288 
does not contain information about which gates correspond to which multiplier 
outputs. However, each gate is numbered and the output numbers seem to be 
increasing with the the gate numbers - with the exception of the last pair of 
outputs. This emphasizes the fact that SAT-based methods are good at finding 
bugs in a system. 

We constructed shift-and-add multipliers of different sizes and verified that 
they always terminate, i.e., we checked “AF done”. The number of iterations 
needed to reach the fixed point is equal to the size of the multiplier. This lets 
us test how well our method handles cases with lots of iterations. Table 3 shows 
the results. We compare our method with NuSMV and Bwolen Yang’s SMV. 
Our method performs much better as we are both significantly faster and we are 
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able to handle much larger designs. We cannot compare with FixIt as they did 
not report results for AF properties. 



Size 


BED NuSMV Bwolen 


16 


1.6 


2.2 


5.2 


IS 


1.8 


18 


9.1 


20 


2.0 


90 


24 


22 


2.3 


472 


104 


23 


2.7 


- 


253 


24 


2.8 


- 


- 


32 


3.7 


- 


- 


64 


17 


- 


- 


128 


119 


- 


- 


256 


1185 


- 


- 



Table 3. Runtimes in seconds for verify- 
ing that shift-and-add multipliers of diffe- 
rent sizes always terminate, i.e., we check 
“AF done”. The number of iterations to 
reach the fixed point is equal to the size of 
the multiplier. 



6.2 Barrel Shifter 

This example is a barrel shifter from the BMC-l.Of distribution and like the 
multiplier, it also falls within the category of systems which we handle well. A 
barrel shifter consists of two register files. The contents of one of the register 
files is rotated at each step while the other file stays the same. The width of a 
register is log R, where R is the size of the register file. 

The correctness of the barrel shifter is proven by showing that if two registers 
from the files have the same contents, then their neighbors are also identical. 
The set of initial states is restricted to states where this invariant holds. The left 
part of Table 4 shows the results. The BED and FixIt methods are both fast, 
however, the BED method scales better and thus outperforms FixIt. NuSMV 
and Bwolen Yang’s SMV are both unable to construct the BDD for the transition 
relation for all but the smallest examples. 

We prove liveness for the barrel shifter by showing that a pair of registers in 
the files will eventually become equal. The number of iterations for the fixed point 
calculation is equal to the size of the register file. The right part of Table 4 shows 
the results. We do not compare with FixIt as no results for this experiment were 
reported in [1]. As in the previous case, NuSMV and Bwolen Yang’s SMV can 
only handle small examples. 

7 Conclusion 

We have presented a BED-based CTL model checking method based on the 
classical fixed point iterations. Quantification is often the Achilles heel in CTL 
fixed point iterations but by using quantification-by-substitution we are in some 
cases able to deal effectively with it. While our method is complete, it performs 
best on examples with a low number of inputs and where the transition rela- 
tion is mainly in functional form. In these situations we can fully exploit the 
quantification-by-substitution rule. 
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Size 


BED NuSMV Bwolen 


FixIt 


Size 


BED NuSMV Bwolen 


2 


0.1 


0.1 


1.0 


0.1 


2 


0.2 


0.1 


1.0 


4 


0.3 


0.2 


2.5 


0.1 


4 


0.5 


0.2 


2.1 


6 


0.4 


609 


- 


0.2 


6 


0.7 


521 


- 


8 


0.4 


- 


- 


0.5 


8 


0.9 


- 


- 


10 


0.6 


- 


- 


1.1 


10 


1.2 


- 


- 


20 


1.9 


- 


- 


14 


20 


3.2 


- 


- 


30 


4.0 


- 


- 


52 


30 


5.9 


- 


- 


40 


8.0 


- 


- 


231 


40 


11 


- 


- 


50 


13 


- 


- 


502 


50 


18 


- 


- 


60 


19 


- 


- 


? 


60 


28 


- 


- 


70 


30 


- 


- 


? 


70 


47 


- 


- 



Table 4. Runtimes in seconds for invariant (left) and liveness (right) checking of the 
barrel shifter example. A question mark indicates that the runtime for FixIt was not 
reported in [1]. For the BED method we use Sato for checking satisfiability of I ^ R. 



We have shown how the quantification-by-substitution rule can also help sim- 
plify the final set inclusion problem of model checking and help perform efficient 
iterative squaring. Our proposed method combines SAT-solvers and BED to 
BDD conversions to perform satisfiability checking. We use a set of local rewrit- 
ing rules which helps to keep the size of the BEDs down. 

We have demonstrated our method by model checking large shift-and-add 
multipliers and barrel shifters, and we obtain results superior to standard BDD- 
based model checking methods. Furthermore, we were able to find a previously 
undetected bug in the specification of a 16 bit multiplier. 

Future work includes investigating two variable ordering problems. One is the 
variable ordering when converting the BED for / i? to a BDD. The variable 
ordering is known to be very important in BDD construction, and since we, 
in some cases, spend much time on converting / i? to a BDD, our method 
will benefit from a good variable ordering heuristic. The other problem is the 
order in which we quantify the variables in the Preimage computation. This 
will be interesting especially in cases where we cannot use the quantification- 
by-substitution rule. Finally we are currently investigating how to extend our 
method to work well for systems with many inputs. 
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Abstract. Several proof rules based on the assume-guarantee paradigm 
have been proposed for compositional reasoning about concurrent sy- 
stems. Some of the rules are syntactically circular in nature, in that 
assumptions and guarantees appear to be circularly dependent. While 
these rules are sound, we show that several such rules are incomplete, 
i.e., there are true properties of a composition that cannot be deduced 
using these rules. We present a new sound and complete circular rule. We 
also show that circular and non-circular rules are closely related. For the 
circular rules defined here, proofs with circular rules can be efficiently 
transformed to proofs with non-circular rules and vice versa. 



1 Introduction 

In his landmark paper [Pnu77], Pnueli advocated the use of temporal logic as 
a formalism for describing the correct operation of reactive systems [HP85]. 
To show that a reactive system, M, is correct, one specifies the correctness 
condition for M as an assertion, /, of temporal logic and applies proof techniques, 
either automatic [CE81,QS82,CES86] or deductive [Pnu77,MP84], to show that 
M satisfies / (M \= /). 

Model checking [CE81,QS82] (c/. [CES86] [VW86]) is an automatic technique 
for showing that M |= /. It is efficient, with complexity linear in the size of M 
(\M\) for temporal logics such as Computation Tree Logic (CTL) [CE81] and 
Linear Temporal Logic (LTL) [Pnu77]. However, when M is given as the parallel 
composition of n processes, each of size bounded by K, the size of M may be AT”. 
This state explosion problem is one of the main obstacles to the more wide-spread 
application of model checking. 

Compositional reasoning techniques form a promising approach to ameliora- 
ting the state explosion problem. To prove that the parallel composition of Mi 
with M 2 , written as Mi// M 2 , satisfies the correctness specification h, composi- 
tional techniques provide proof rules that justify the above correctness assertion 
from two proofs done in isolation, the first stating the correctness of Mi and 
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the second stating the correctness of M2. For example, the following is a typical 
rule: 



{h}M2{g} 



{/}Mi//M2{5} 

This rule works as follows. First, show {f}Mi{h}, that is, that Mi satisfies 
h under the assumption /. Second, show that {h}M2{g}- Then the correctness 
assertion {f}Mi/ /M2{g} may be concluded as a consequence of the soundness of 
the rule. Such compositional reasoning provides a benefit to the extent that direct 
reasoning about M1//M2 has been avoided. In general, however, determining the 
appropriate auxiliary assertion h may be highly non-trivial. 

To ease the difficulty of determining the auxiliary assertions, several so called 
circular proof rules have been proposed. For example, consider the following rule 
(c/. [McM99]) 



{/}Mi{32>5i} 
{f}M2{gi > 52 } 



{/}Mi//M2{G(gi Aff2)} 

Several points seem to differentiate this rule from the previous one. Firstly, 
the form of the postconditions has been restricted to specific operators. The 
property q i> p (read as “q constrains p ” ) is true of a computation if, for all 
i, p is true at point i of the computation if q holds at all points j < i; this 
can be expressed in LTL as -•{q U -•p). The property G{p) (read as “always p”) 
is true of a computation if p holds at all points of the computation. Secondly, 
in the first sub-goal, {/}Mi{p2 > 5i}, gi is understood to be the correctness 
assertion of Mi while 52 is a helper assertion. This may be justified by thinking 
of Ml as an open system, which interacts with an environment over which it 
has little control. Thus, the correct operation of Mi may be dependent on the 
correct operation of its environment M2, therefore, 52 appears as a guarantee 
of the correct operation of the environment in the proof of the correctness of 
Ml . The appearance of gi in the proof sub-task for M2 can be similarly justified 

- hence the use of the word “circular” in the name for such proof rules. The 
circularity helps to more easily encode the back-and-forth handshake protocols 
that designers typically use for connecting components of a system. 

For any proof system, soundness is, of course, the most important property 

- it should not be possible to deduce false facts. A measure of the quality or 
usefulness of a proof system is obtained from an investigation into completeness 

- is it possible to deduce all true facts using the rules of the system? Existing 
compositional rules, including the circular ones, are known to be sound. 

In this paper, we first investigate the completeness of existing compositional 
reasoning rules, focusing on rules that are known to be sound for arbitrary linear 
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temporal properties and which have been used successfully to verify large sy- 
stems. Surprisingly, several such rules turn out to be incomplete; that is, there are 
correctness assertions about Mi// M 2 which are true but are not provable from 
the proof rules. Typically these unprovable assertions are liveness properties, but 
some rules may also be incomplete for safety properties. The counter-examples 
for incompleteness are also quite simple, which indicates that the rules may be 
inadequate for handling many compositions that arise in practice. We propose 
a new circular reasoning rule similar to the one above and show that it is both 
sound and complete. The new rule strengthens the previous rule in a manner 
analogous to strengthening a proof of invariance by introducing auxiliary as- 
sertions - to show Gp show G(p A h). Furthermore, our new rule is backward 
compatible, in that any proof done using the previous rule is also a proof with 
the new circular rule. 

We then investigate whether circularity is, in itself, essential for reasoning 
about composed systems. We show that the notion of circularity is a somewhat 
weak one for LTL properties, in that proofs carried out with circular rules can 
be efficiently translated to proofs with non-circular rules, and vice-versa. 

The paper is organized as follows: Section 2 contains some preliminary defi- 
nitions; Section 3 gives the details of several different styles of proof rules and 
develops our new sound and complete circular proof rule; Section 4 discusses 
the translations between proofs carried out with circular and non-circular rules. 
Finally, Section 5 contains a brief conclusion and discusses related work. 

2 Background 

In this section, we define the computational model and provide examples of 
circular and non-circular rules for compositional reasoning. 

2.1 Temporal Logic 

LTL was first suggested as a protocol specification language in [Pnu77]. Formulae 
in the logic define sets of infinite sequences. We define LTL formulae w.r.t. a set 
of variable symbols. As in first-order logic, one can construct terms over the set 
of variables using function symbols from a vocabulary if, and atomic predicates 
from terms, using relational symbols from a vocabulary TZ. We define atomic 
predicates and temporal formulas below. A predicate is a boolean combination 
of atomic predicates. 

— For a relational symbol r € TZ of arity n and terms to, ■■ ■ , tn-i, 
r(to, • ■ . , tn-i) is an atomic predicate and a formula, 

— for formulae / and g, (/ A g) and “■(/) are formulae, 

— for formulae / and g, X(/),X~(/), (/ U g), and (/ g) are formulae. 

The temporal operators are X {next-time), X~ {previous-time), U {until), and 
{since). Given an interpretation X (which we assume fixed from now on) 
for the function and relation symbols, temporal formulae are interpreted w.r.t. 
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infinite sequences of valuations of the variables. For a set of typed variables 
W, let a W -state be a function mapping each variable in IF to a value in its 
type. The set of IF-states is denoted by S{W). A W -sequence a is an infinite 
sequence of IF-states, which is represented as a function cr : N — >■ S{W) (N is 
the set of natural numbers) . We write cr, i ^ / to say that the infinite sequence 
a satisfies the formula / at position i. The language of /, denoted by £(/), is the 
set {cr : CT, 0 ^ /}. The satisfaction relation can be defined by induction on the 
structure of /. First, the value of a term t at location i on cr, denoted as aft), 
may be defined by induction on the structure of terms. Next, the satisfaction 
relation for formulas is defined as follows. 

— a, i\= r{to , . . . ,t„-i) iff {I{r)){afto ), . . . ,aftn-i)) is true. 

~ a, i \= -•(/) iff CT, f ^ / is false; a,i |= {f A g) iff both a,i \= f and a,i \= g 
are true. 

— a, i\= X(/) iff cr, f + 1 1= /. 

— CT, f 1= (/) iff z > 0 and cr, z — 1 |= /. 

— a, i \= {f U g) iS there exists j, j > i, such that a,j \= g and for every k, 

i<k < j, a,k^^ f. 

— CT, z ^ (/ g) iff there exists j, j < i, such that a,j \= g and for every k, 
j < k <i, a,k \= f. 

Other connectives can be defined in terms of these basic connectives: (/Vg) is 
-•(-•f A-'o), (f g) is -•fVn, Fq (“eventually g”) is (true U g), F^g (“previously 

g”) is (true g), Gf (“always /”) is -F(-/), (/ W g) (“/ holds unless g”) 

oo oo 

is (G(/) V (/ U g)), Fp (“infinitely often p”) is GFp, Qp (“finitely often -ip”) is 
FGp, and g l> p (read as “g constrains p”) is -•(q U -•p). 

Quantified Temporal Logic: The expressive power of temporal logic can 
be enhanced by allowing variable quantification. The formula (3W : /) is true 
of a F-sequence a iff there is a F U IF-sequence <5 that agrees on the F-variables 
with cr and which satisfies /. In the finite case, the expressive power of quantified 
temporal logic is that of w-regular expressions - see [Tho90] for a survey of these 
issues. 

2.2 Computational Model 

We adopt a definition of a process similar to those in [Pnu77, AL95,McM99] . A 
process is specified by giving an initial condition, a transition condition and a 
fairness condition over a set of variables. 

Definition 0 (Process) A process is specified by a tuple (V, I, T, F) where 

— V is a finite, nonempty set of typed variables. We define a set of primed 
variables V that is in 1-1 correspondence with V. 

— I(V), the initial condition, is a predicate on V, 

— T(V, V'), the transition condition, is a predicate on VUV', which is left-total. 

— F{V,V), the fairness condition, is a boolean combination of temporal for- 

OO OO 

mulas F(p) and G(p), for predicates p on V A V' . 
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For W such that V C W, a W -computation cr of a process is a 4F-sequence 
such that /((Jo), and for each i G N, By considering x' G R' as a 

term that specifies the value of x G F in the next state, the set of computations 
can be defined by the temporal formula / A G(T), interpreted over IF-sequences. 

Definition 1 (Language) For a set of variables W such that V C W, the 
W -language of a process M = (V, I ,T, F), denoted by Cw{M), is the set ofW- 
computations of M that satisfy the fairness condition F. Thus, Cw(M) can be 
expressed by the LTL formula I A G(T) A F , interpreted over W -sequences. 

We define process composition so that the language of a composition M\ / /M2 
is the intersection of the languages of M\ and il/2. The semantics of most hard- 
ware description languages follows this model. In addition, as shown in [AL95], 
with some reasonable restrictions, it holds also of asynchronous models of com- 
putation. 

Definition 2 (Process Composition) The composition of processes Mi = 
{Vi, Ii,Ti, Fi) and M2 = {V2, l2,T2, F2) is denoted by M1//M2, and is defined 
as the process {V,I,T,F) where 

- V =Vi\J V2, 

- I = Ii A I2, 

- T = Ti AT2, 

^ F = Fi A F2 

With this definition of composition, it is possible that T is not left-total even 
though Ti and T2 are left-total. In the rest of the paper, we restrict ourselves to 
those compositions where T is left-total. 

Theorem 0 For a composition M = M1//M2 and a set of variables W such 
that {Vi U V2) C W, Cw(M) = Cw{Mi) n £^(^2). □ 



Definition 3 (Model Checking) The model checking question is to deter- 
mine if a property f defined over a variable set W is true of all computations of a 
program M with a variable set that is a subset ofW; i.e., z/(VVF : Lw{M) /) 
holds. 

2.3 Compositional Reasoning 

The model checking question for a composition Mi/ / M2 may be phrased as 
(VVF : Cw{Mi/ /M2) /), which is equivalent, by Theorem 0, to (VIT : 

Cw{Mi) A Cw{M2) ^ /). Compositional reasoning rules convert this question 
into two separate model checking questions, one explicitly involving Mi and the 
other explicitly involving M2. This separation is typically required in the proofs 
of large systems because even the symbolic representation (as BDD’s) of the 
transition relation for Mi/ / M2 is infeasible. 
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Assume-guarantee rules for composition attempt to generalize the pre- and 
post-condition reasoning ofHoare logic [Hoa69]. Informally, atriple {f}M{g} as- 
serts the property that every computation of M which satisfies the assumption / 
satisfies the guarantee g. Formally, this can be stated as (VhF : fA£w{M) g). 
We state below two typical compositional reasoning rules based on the assume- 
guarantee formulation: the first is syntactically non-circular, while the second is 
syntactically circular, in that the assumptions of one process form the guarantees 
of the other and vice-versa. 

Definition 4 (Non-circular Reasoning (NC)) Show {f}Mi/ /M 2 {g} holds 
by picking an intermediate property h such that {f}Mi{h} and {h}M 2 {g} hold. 

Rule NC is sound, which can be shown with simple propositional reasoning 
from the definitions. It is also (trivially) complete; if {f}Mi/ /M 2 {g} holds, then 
choosing h = f A Cw{Mi), one may show {f}Mi{h} = (VIF : / A Cw{Mi) 

/ A Cw{Mi)) = true, and {h}M 2 {g} = (VW : / A Cw(Mi) A Cw{M 2 ) g), 
which is true by the assumption. 

The following rule is derived from the application of a property decomposition 
theorem to compositional reasoning in [McM99] (c/. Theorem 1 in [McM99]). 
Below we make use of the following notation: let B = {/i, . . . , fk} be a set of 
LTL formulae, then the formula B is a shorthand for {/\i : fi). 

Definition 5 (Syntactically Circular Reasoning (Cl)) Consider the com- 
position M = {//j : Mj). Let {gi} be a set of properties. To show that 
{f}M{G{/\i : g,)} holds, 

— with each i, choose a composition M{i) = {j jk \ Mu) where k ranges over a 
strict subset of the process indices, 

— choose a well founded order -< and subsets Oi and Ai of the set of properties 
{gi}, such that if gj € 0i, then j -< i, 

and show that {f}M{i){ Ai l> {~<Oi V gi)} holds for all i. 

The requirement that M{i) is a strict sub-composition of M is imposed to 
prevent trivial applications of this rule with every M(i) equal to M. 



3 (In) Complete Proof Rules 

We have shown in the previous section that the non-circular rule NC is both 
sound and complete. In this section, we consider the proof rule Cl and the 
assume-guarantee rule from [AL95] and show that these circular rules are in- 
complete, even for finite-state processes. Our choice of these rules is guided by 
two considerations: (i) these rules, unlike many other compositional rules (as 
discussed in Section 5), are sound for arbitrary linear temporal properties and 
(ii) they have been used successfully (cf. [McM98]) to verify large systems. We 
then present a new sound and complete circular rule. 
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3.1 Incompleteness 

To demonstrate that rule Cl is incomplete, consider the programs below where, 
informally. Mi and M2 juggle four tokens by throwing them back and forth in an 
circular pattern (the l,r variables indicate left and right “hands”, respectively), 
program Mi program M2 

variables li,r\,r2 ■ boolean variables l2,r2,ri : boolean 

initially h A ri initially I2 A V2 

transition {li = V2) A (ri = h) transition {I2 = ri) A (r2 = I2) 

As can be checked easily, the property G(?i A I2) holds of the composition 
M1//M2. Applying the substitution / = true,gi = h,g2 = h to rule Cl, we 
obtain the property {trMe}Mi//M2{G(/i A 12)}- However, as can be checked by 
enumeration, there is no way to define the well founded order and the subsets 
&i and Ai such that the sub-goals of rule Cl are satisfied. Intuitively, this is 
because the next value of /i is determined by the current value of C2, which is 
unconstrained by the assumptions. Hence, the original property, which is true of 
the composition, cannot be shown using the proof rule Cl. 

One reason that this rule is incomplete is that it does not permit a choice 
of auxiliary assertions, as in rule NC. If auxiliary assertions were allowed, it is 
easy to see that the strengthened property G(^i A ri A ^2 A V2) can be shown 
by properly instantiating rule Cl. This is similar to the incompleteness of the 
inductive invariance rule for establishing G(p); one often needs to strengthen p 
to p Ah and show that this strengthened formula is an inductive invariant. We 
say more about strengthening in Section 3.2. 

The circular proof rule presented in [AL95] is given below. In this rule, // 
represents asynchronous composition and C{M) is defined so that it is insensitive 
to stuttering, however, t{Mi/ /M2) is defined so that it equals t{Mi) A £{M2). 
Although this rule allows the choice of auxiliary assertions Ei, it turns out that 
it is still incomplete, because of the restricted form of the hypotheses. In the 
definition below, C(/), for an LTL property /, is the strongest safety property 
that is weaker than / while /+„ asserts that if the formula / should become false 
then the variable v becomes constant (for more details see [AL95]). 

Definition 6 (Circular Rule C2 [AL95]) To show that E A C{Ni/ /N2) 
L{Mi! IM2) holds, pick Ei and show that for each i in {1,2} all the following 
hold. 

~ C{E)A^^^^, ,^C{l{M,))^E, 

- C{E,)+, A C{l{N,)) ^ C{C{Mi)) 

- E, A t{Ni) ^ t{Mi) 

Consider the programs Mi, M2, Ni and N2 given below, all of which have 
initial condition true and a weak- fairness condition on the actions 01,02. 
program Mi program M2 

variables x : boolean variables y : boolean 

transition oi: x := true, bi: x := false transition 02: y ;= true, 62: y ~ false 
Thus, the specification programs Mi and M2 define the properties GFx and 
GFy respectively. The implementation programs Ni and N2 are as follows. 
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program Ni program N2 

variables x, y : boolean variables x, y : boolean 

transition m; x ~ y transition 02: y ~ true, 62: y '■= ~'X A y 

It is easy to check that E t\C{Ni/ /N2) L(M\j with E = true. By a 

result in [AL 95 ], C{C{M)) is just the temporal formula for process M without the 
fairness condition. Thus, C{C{Mi)) = true A G(x' = true V x' = false V x' = x), 
which simplifies to true, as does C{C{M2)). Hence, if the first hypothesis is 
to hold, both El and E2 must equal true. Therefore, by the third hypothesis, 
E{Ni) t{Mi), which is false as Ni admits computations where x is false at 
every point. Hence, rule C 2 is incomplete. 



3.2 A Sound and Complete Circular Rule 

We present a rule that allows the choice of auxiliary properties over process in- 
terfaces, while retaining the overall style of the proof obligations of rule Cl. This 
rule is shown to be sound and complete. For clarity, we restrict the discussion 
below to the two process case - there is a straightforward generalization to the 
n process case. 

Definition 7 (Circular Reasoning (C3)) For properties g\ over V\ and g2 
over V2, to show {f}Mi/ /M2{G{gi A g2)}, pick properties hi and /12 for which 
the following obligations hold. 

1- {f}Mi{ (/i2 A 52) l> {Qi2 ^ gi) A hi)} 

2. {f}M2{ {hi A 51) O {{hi ^ 32) A /12)} 

Note that an instance of rule Cl can be obtained from C3 by making the 
substitution hi = true,h2 = true. 

Theorem 1 (Soundness) Rule C3 is sound. 

Proof. Assume that the hypotheses of the rule hold for some choice of hi and 
/i2- Then the guarantees of both hypotheses are true for any computation of 
Ml/ / M2 that satisfies /. Consider any such computation cr, and any point i on 
cr. Assume inductively that for all points j, j < i, the property {gi A ft-i A (/2 A ft-2) 
holds. By the first hypothesis of rule C3, (/12 gi) A hi holds at point i, and by 
the second hypothesis, so does {hi 32) A ft-2- Hence, the inductive hypothesis 
holds at point i + 1. This shows that G{gi Ahi A g2 A /12) holds of cr, from which 
it follows that the weaker property G((7i A (72) also holds of a. □ 

Theorem 2 (Completeness) Rule C3 is complete. Furthermore, if f is defi- 
ned over the interface variables Vi fl V2; it is always possible to choose hi and 
/i2 os properties over Vi fl V2 . 

Proof. In the following, let H = Vi U V2, and suppose that / is defined over 
Vi n V2. Suppose that {f}Mi/ /M2{G{gi A (72)} holds. By definition, this is 
equivalent to (VH : / A Cv{Mi) A Cv{M2) ^ G{gi A (72)), which is equivalent to 
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(Vy : /A£v(Mi)A£v(M 2 ) ^ G( 5 i)) and (Vy : f A Ly (M^) A Ly (M^) ^ G(52)) 
both being true. Consider the first property: 

(Vy :/A£v(Mi)A£y(M2)^G(gi)) (1) 

Let «i = (3y\y2 : / A Cy{Mi)) and = (3y\yi : / A Cy{M 2 ))- Define h\ 
as F'ai and /12 as F^a 2 - As Mi and g\ are defined over yi, expression 1 can be 
re-written as: 



(Vyi : £y(Mi) A 02 ^ G((/i)) ( 2 ) 

By definition of a\, it is also true that: 

(Vy :/A£y(Mi)^ai) ( 3 ) 

Consider the first hypothesis: {/}Mi{ Q12 A 52) >((^2 9i) A hi)}, and con- 

sider any sequence satisfying / A Ly{Mi). By equation 3 , «i is true initially, so 
G(/ii) is true. Now consider an arbitrary position i on the sequence such that 
{h2 A 92) holds for all positions j, j < i. 

case z = 0 : We have to show that (/12 9i) at position 0 . If /12 is true 

initially, then 0:2 is true initially. By equation 2 , 91 must then be true. 

case z > 0 : As z > 0 , /i2 holds initially, which implies that «2 is true initially. 
By equation 2 , G((/i) holds at the origin, so that (ft-2 9i) is true at point z. 

Hence, the first hypothesis is true. In a similar manner, one may argue that 
the second hypothesis is also true; so the rule is complete. Note that the auxiliary 
assertions hi and ft-2 are defined over the common interface variables Vi H ¥2- O 
For the first example, which showed the incompleteness of rule Cl, the fol- 
lowing choices for hi and /z2 over the common variables {ri,r2} ensure that the 
hypotheses of rule C3 hold: hi = ri,h2 = T2- 

For the second example, which showed the incompleteness of rule C2, the 
property to be satisfied by N1//N2 may be written as G(GFcc A GFz/). The fol- 
lowing choices for hi and ft.2 over the common variables x, y ensure that the 
hypotheses of rule C3 hold: hi = true, h2 = GFy. 

Interestingly^, if we modify the hypotheses of rule C3 so that they have 
the form {/}Mi{G(/zi) A (52 A /Z2) l> (/12 ffi)}) which is also sound and com- 
plete, then these hypotheses may be obtained from rule Cl for the composi- 
tion M1//M2 with the following substitutions: gi = 91,92 = 92,93 = hi, 94 = 
/z2,Ai = {g2,94},Oi = {94}, ^2 = {51,53}, 6*2 = | 53 }- The other O and A 
variables equal 0 . We also choose M(l) = Mi,M( 2 ) = M2,M(3) = Mi,M( 4 ) = 
M2 and the relation 3 ^ 2,4 ^ 1 . This shows that modifying rule Cl by 
allowing the gi's to be augmented with auxiliary assertions hi to form {gi A hi), 
results in a sound and complete rule. 



^ We thank an anonymous referee for this observation. 
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4 Translating Proofs 

In this section we show how proofs derived using the circular rule C3 can be 
translated into proofs using the non-circular rule NC and vice-versa. We also 
discuss some of the consequences of these translations. In the sequel, when W is 
clear from the context, we will write (VW : /) simply as /. 

Theorem 3 (From Circular to Non-Circular) Suppose {/}Mi//M2{G(5i A 52)} 
has been derived from the circular rule C 3 . Then {/}Mi//M2{G((/i A (72)} Tnay 
he derived by application of the rule NC by letting the intermediate assertion h 
equal / A (32 A /12) > ((^2 ^ 5i) A hi). 

Proof. {f}Mi{f A (52 A /12) > ((^2 9i) A /ii)}, the first requirement of rule 

NC, follows as a direct result of the first proof obligation from C3 in the premise. 
We now show why the second requirement of rule NC also holds. 

{/ A {h2 A 32) > ((/i2 ^ gi) A hi)}M2{G{gi A 32)} 

= ( by the definition of {f}M{g} ) 

{f}M 2 { {h 2 A 52) l> {{h 2 9 i) A hi) =4> G{gi A 52)} 

4= ( by the second proof obligation of rule C3 ) 

{hi A gi) O {{hi ^ 32) A /12) ^ 

[ {h2 A 92) > {{h2 9i) A hi) => G{gi A 32)] 

= ( re-arranging ) 

[ {hi A gi) > {{hi 52) A *,2)] A [ (/12 A 32) > ((^2 ^ 9 i) A hi)] 

G{gi A (72) 

= ( temporal logic (see proof of Theorem 1) ) 

true 

□ 



Theorem 4 (From Non-Circular to Circular) Suppose {f}Mi/ /M2{g} has been 
derived from the non-circular rule NC using the intermediate assumption h. 
Then the conclusion {f}Mi//M2{g} may be derived by application of the rule 
C 3 using the substitution hi = f^h, /12 = true, gi = true and g2 = 9 - 

Proof. Firstly, we note that the conclusion of the rule is the desired one. It 
is straightforward to show that, at the initial point, any computation satis- 
fies GF^g iff it satisfies g. Therefore, {f}Mi/ /M2{G{gl A g 2 )} is equivalent to 
{f}Mi//M2{g}. 

Consider the first proof obligation. 

{f}Mi{ {h 2 A 52) l> {{h 2 ^ 9 i) A hi)} 

= ( substituting and simplifying ) 

{f}Mi{g2>hi} 

<= ( l> is anti-monotone in its first argument ) 

{f}Mi{ true t> F^(/i)} 

= (as true o F^(/i) = GF^(/i) ) 

{f}Mi{GF-{h)} 
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( as GF (p) and p are equivalent at the initial point ) 
{f}M,{h} 

( by the first premise in the supposition ) 
true 



Now we show that the second premise is also true. 

{/}^ 2 { {hi A gi) O {{hi ^ 32 ) A / 12 )} 

( substituting and simplifying ) 

{f}M2{hi l> {hi => 32 )} 

( definition of hi,g 2 ) 
{f}M 2 {f-{h)>{f-{h)=^f-{g))} 



This follows by an induction on positions of any sequence satisfying / A 
C{M 2 ). At the initial position, by the second part of NC, (h g) holds, thus, 
(F~/i F^g) holds at the initial position. At any other position f, by the 
assumption F^/i for 0 up to z — 1, /i must be true initially, hence, g is true 
initially by the second part of NC, so that g is true at position i. □ 

We note that the translations make no use of quantified formulae, which 
justifies the following corollary. 

Corollary 0 Compositional Rule C3 is complete for linear temporal logic. 

Proof. As in the proof of the previous theorem, one can write a property / 
as G{true A F^(/)) and apply rule C3. Since C3 was shown to be complete for 
properties of the form G{gi A (^ 2 ) it follows that C3 is sufficient for proving any 
linear temporal logic property. We note that the proof of Theorem 2, presented 
above, does make use of quantified temporal properties. This use of quantification 
is beneficial, in that the properties constructed in the proof of completeness 
may be restricted to the variables which are mentioned in the interface between 
Ml and M 2 . However, it is possible to prove Theorem 2 without reference to 
quantified formulae (this proof has been left out for space reasons) and hence 
the result follows. □ 

4.1 The Cost of a Proof 

In this section we discuss the computational cost of applying the various rules. 
The goal is to calculate the cost of translating an instance of the use of one proof 
rule into the use of another proof rule and to get a measure on the complexity 
of a proof rule. 

Consider the proof obligation {f}M{g} where / and g are pure LTL for- 
mulae (they contain no quantifiers). If such a proof were done by hand then 
any complexity measure would be, at best, highly subjective. Suppose, on the 
other hand, that this proof were given to a model checker. The obligation amo- 
unts to showing that every W-computation of M that satisfies / also satisfies 
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g. Typically, this is done as follows [VW86]. First, translate /, M and -•g into 
automata Af, Am and A^g, respectively, such that the set of sequences accepted 
by the automaton is exactly the set of sequences accepted by the corresponding 
formula. Then C{Af fl Am H A^g) = 0 iff the set of computations of M which 
satisfy / all satisfy g. The complexity of the translation is approximately linear 
in \M\ and exponential in the lengths of / and g [VW86] [LPZ85]. This leads to 
the following definition. 



Definition 8 (Proof Cost) The cost of a proof obligation {f}M{g}, for tem- 
poral logic formulae f and g and finite state structure M is |M| . 

Note that | q\>p\ = |-'((7 U ~'(p))|, which equals \p\ + |<7| + 3. Suppose that we 
have translated a circular proof to a non-circular proof via the method outlined 
in Theorem 3. Without loss of generality, assume that \M\\ < IM2I, \h\\ < I/12I 
and |gi| < \g2\ - The cost, c, of showing the premises of rule C3 

- {f}Mi{ (/i2 A 52) l> {{h2 gi) A /ii)} and 
~ {f}M 2 { {hi A 51) O {{hi ^ 32) A /12)} 

is bounded by The cost, nc, of showing the translated 

premises for rule NC 

- {f}Mi{f A (/i2 A 52) l> {{h-2 ^ gi) A /ii)} and 

- {/ A {h2 A 32) > {{h2 gi) A hi)}M2{G{gi A 32)} 

is bounded by 22|/l+3l'*d+4|s2|-en|^2|. 

So the cost of the circular proof is bounded by 2“|M2|, where a is a function 
linear in the sizes of /, gi and hi and the cost of the non-circular proof is bounded 
by 2^“|M2|, so the translation process can be said to be efficient. 

Now suppose that we have translated a non-circular proof to a circular proof 
via the method outlined in Theorem 4. The cost, nc, of showing the premises 
for rule NC 

- {f}Mi{h} and 

- {h}M2{g} 

is bounded by The cost, c of showing the simplified translated 

premises for rule C3 

- {f}Mi{f-{g)i>f-{h)} and 

- {f}M2{F-{h)>{F-{h)^F-{g))} 

is bounded by Hence translating from circular to non-circular 

is, essentially, efficient. 
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5 Related Work 



There are several proposals for compositional reasoning rules in the literature, 
but only a few investigations of the completeness of these rules - a good survey of 
the field appears in the COMPOS97 proceedings [dRLP97]. The earliest propo- 
sals for assume-guarantee reasoning are from [Jon81,CM81] - these are concerned 
with establishing safety properties of networks of processes. Zwiers’ book [Zwi89] 
contains much of the groundwork necessary for reasoning about compositional 
proof systems. Proofs of the completeness of compositional reasoning systems 
for safety properties are found in [ZdRvE84] [Pan88] [PJ91] [dRdBH+99]. Other 
assume-guarantee rules for safety properties are proposed in [Sta85] [Pnu85] 
[Kur87] [AH96] [McM97]. More general rules that apply to both safety and liven- 
ess properties are proposed in [Pnu85] [Jos87] [CLM89] [GL94] [AL95] [McM99]. 

We have concentrated on the completeness question for general rules that 
apply to both safety and liveness properties. As shown in Section 3, the circular 
rules in [AL95] and the rule Cl derived from [McM99] are incomplete. The cir- 
cular rule presented in [HQRT98] for the simulation-based verification paradigm 
is also incomplete - for lack of space, this proof is left for the full paper. The 
simplicity of the counter-examples suggests that the incompleteness may indeed 
impact the verification of systems in practice. We present a new circular rule, 
which is a modification of rule Cl, and show it to be sound and complete for all 
of LTL - in fact it is straightforward to generalize these ideas so that the rule 
is complete for the w-regular languages. The proofs carried out using rule Cl, 
including that of the Tomasulo algorithm [McM98], can be carried out in exac- 
tly the same manner with the new rule. We also investigate whether circularity 
is, in itself, essential for reasoning about composed systems, and show that for 
assume-guarantee reasoning in LTL, the notion of circularity is a somewhat weak 
one, in that proofs carried out with circular rules are efficiently translatable into 
proofs with non-circular rules, and vice-versa. 

The computational complexity of establishing an assume-guarantee triple 
has been studied extensively in [GL94,KV95,KV97], for various combinations of 
specification logics. We have considered a different question, that of the comple- 
xity of translating between proofs obtained with different compositional rules, 
whenever this is possible. 

There are a number of ways one could choose to strengthen the circular proof 
rules found in the literature in order to make them complete. We have chosen 
one in particular, rule C3. Our choice was motivated by a desire to remain as 
close as possible to the spirit of the original circular proof rule - namely, to avoid 
the “direct” use of Mi// M 2 when proving properties about this composition. 
Specifically, we have decided not to allow the use of temporal implication, h ^ g, 
as a proof rule [MP95] . Our results show that implication is not necessary in order 
to obtain a sound and complete rule. Furthermore, rules that include implication 
may allow the proof of / A Cw{Mi) A Cw{M 2 ) g to be instantiated directly 
as an implication without use of the, hopefully, better rules mentioning only Mi 
or M 2 . That this goal has been, to some extent, mitigated against in our proof 
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of completeness should not come as a surprise in light of the difficulty of the 
problem. 

Acknowledgment: The authors thank Willem-Paul de Roever and the anony- 
mous referees for many interesting and helpful comments. 
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Abstract. We present an automatic iterative abstraction-refinement methodology 
in which the initial abstract model is generated by an automatic analysis of the con- 
trol structures in the program to be verified. Abstract models may admit erroneous 
(or “spurious”) counterexamples. We devise new symbolic techniques which ana- 
lyze such counterexamples and refine the abstract model correspondingly. The 
refinement algorithm keeps the size of the abstract state space small due to the 
use of abstraction functions which distinguish many degrees of abstraction for 
each program variable. We describe an implementation of our methodology in 
NuSMV. Practical experiments including a large Fujitsu IP core design with ab- 
out 500 latches and 10000 lines of SMV code confirm the effectiveness of our 
approach. 



1 Introduction 

The state explosion problem remains a major hurdle in applying model checking to large 
industrial designs. Abstraction is certainly the most important technique for handling this 
problem. In fact, it is essential for verifying designs of industrial complexity. Currently, 
abstraction is typically a manual process, often requiring considerable creativity. In order 
for model checking to be used more widely in industry, automatic techniques are needed 
for generating abstractions. In this paper, we describe an automatic abstraction technique 
for ACTL* specifications which is based on an analysis of the structure of formulas 
appearing in the program (ACTL* is a fragment of CTL* which only allows universal 
quantification over paths). In general, our technique computes an upper approximation of 
the original program. Thus, when a specification is true in the abstract model, it will also 
be true in the concrete design. However, if the specification is false in the abstract model, 
the counterexample may be the result of some behavior in the approximation which is not 
present in the original model. When this happens, it is necessary to refine the abstraction 
so that the behavior which caused the erroneous counterexample is eliminated. The 
main contribution of this paper is an efficient automatic refinement technique which 
uses information obtained from erroneous counterexamples. The refinement algorithm 
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keeps the size of the abstract state space small due to the use of abstraction functions 
which distinguish many degrees of abstraction for each program variable. Practical 
experiments including a large Fujitsu IP core design with about 500 latches and 10000 
lines of SMV code confirm the competitiveness of our implementation. Although our 
current implementation is based on NuSMV, it is in principle not limited to the input 
language of SMV and can be applied to other languages. 

Our paper follows the general framework established by Clarke, Grumberg, and 
Long [10]. We assume that the reader has some familiarity with that framework. In our 
methodology, atomic formulas are automatically extracted from the program that descri- 
bes the model. The atomic formulas are similar to the predicates used for abstraction by 
Graf and Saidi [13] and later in [1 1,20]. However, instead of using the atomic formulas 
to generate an abstract global transition system, we use them to construct an explicit 
abstraction function. The abstraction function preserves logical relationships among the 
atomic formulas instead of treating them as independent propositions. The initial abstract 
model is constructed by adapting the existential abstraction techniques proposed in [8, 
10] to our framework. Then, a traditional model checker is used to determine whether 
ACTL* properties hold in the abstract model. If the answer is yes, then the concrete 
model also satisfies the property. If the answer is no, then the model checker generates 
a counterexample. Since the abstract model has more behaviors than the concrete one, 
the abstract counterexample might not be valid. We say that such a counterexample is 
spurious. Such abstraction techniques are also known as false negative techniques. 

In our methodology, we provide a new symbolic algorithm to determine whether an 
abstract counterexample is spurious. If the counterexample is not spurious, we report 
it to the user and stop. If the counterexample is spurious, the abstraction function must 
be refined to eliminate it. In our methodology, we identify the shortest prefix of the 
abstract counterexample that does not correspond to an actual trace in the concrete model. 
The last abstract state in this prefix is split into less abstract states so that the spurious 
counterexample is eliminated. Thus, a more refined abstraction function is obtained. Note 
that there may be many ways of splitting the abstract state; each determines a different 
refinement of the abstraction function. It is desirable to obtain the coarsest refinement 
which eliminates the counterexample because this corresponds to the smallest abstract 
model that is suitable for verification. We prove, however, that finding the coarsest 
refinement is NP-hard. Because of this, we use a polynomial-time algorithm which 
gives a suboptimal but sufficiently good refinement of the abstraction function. The 
applicability of our heuristic algorithm is confirmed by our experiments. Using the 
refined abstraction function obtained in this manner, a new abstract model is built and 
the entire process is repeated. Our methodology is complete for the fragment of ACTL* 
which has counterexamples that are either paths or loops, i.e., we are guaranteed to either 
find a valid counterexample or prove that the system satisfies the desired property. In 
principle, our methodology can be extended to all of ACTL*. 

Using counterexamples to refine abstract models has been investigated by a num- 
ber of other researchers beginning with the localization reduction of Kurshan [14]. He 
models a concurrent system as a composition of L-processes Li , . . . , (L-processes 
are described in detail in [14]). The localization reduction is an iterative technique that 
starts with a small subset of relevant L-processes that are topologically close to the spe- 
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cification in the variable dependency graph. All other program variables are ahstracted 
away with nondeterministic assignments. If the counterexample is found to be spurious, 
additional variables are added to eliminate the counterexample. The heuristic for sel- 
ecting these variables also uses information from the variable dependency graph. Note 
that the localization reduction either leaves a variable unchanged or replaces it by a non- 
deterministic assignment. A similar approach has been described by Balarin in [2,15]. 
In our approach, the abstraction functions exploit logical relationships among variables 
appearing in atomic formulas that occur in the control structure of the program. Moreo- 
ver, the way we use abstraction functions makes it possible to distinguish many degrees 
of abstraction for each variable. Therefore, in the rehnement step only very small and 
local changes to the abstraction functions are necessary and the abstract model remains 
comparatively small. 

Another refinement technique has recently been proposed by Lind-Nielson and An- 
dersen [17] . Their model checker uses upper and lower approximations in order to handle 
all of CTL. Their approximation techniques enable them to avoid rechecking the ent- 
ire model after each rehnement step while guaranteeing completeness. As in [2,14] the 
variable dependency graph is used both to obtain the initial abstraction and in the rehne- 
ment process. Variable abstraction is also performed in a similar manner. Therefore, our 
abstraction-rehnement methodology relates to their technique in essentially the same 
way as it relates to the classical localization reduction. 

A number of other papers [16,18,19] have proposed abstraction-rehnement techni- 
ques for CTL model checking. However, these papers do not use counterexamples to 
rehne the abstraction. We believe that the methods described in these papers are or- 
thogonal to our technique and may even be combined with ours in order to achieve 
better performance. A recent technique proposed by Govindaraju and Dill [12] may be 
a starting point in this direction, since it also tries to identify the hrst spurious state in 
an abstract counterexample. It randomly chooses a concrete state corresponding to the 
hrst spurious state and tries to construct a real counterexample starting with the image 
of this state under the transition relation. The paper only talks about safety properties 
and path counterexamples. It does not describe how to check liveness properties with 
cyclic counterexamples. Furthermore, our method does not use random choice to extend 
the counterexample; instead it analyzes the cause of the spurious counterexample and 
uses this information to guide the rehnement process. A more detailed comparison with 
related work will be given in the full version 

Summarizing, our technique has a number of advantages over previous work: 

( i ) The technique is complete for an important fragment of ACTL* . 

( iij The initial abstraction and the rehnement steps are efficient and entirely automa- 
tic. All algorithms are symbolic. 

(Hi) In comparison to methods like the localization reduction, we distinguish more 
degrees of abstraction for each variable. Thus, the changes in the rehnement are 
potentially hner in our approach. 

(iv) The rehnement procedure is guaranteed to eliminate spurious counterexamples 
while keeping the state space of the abstract model small. 

We have implemented our new methodology in NuSMV [6] and applied it to a number 
of benchmark designs [6]. In addition we have used it to debug a large IP core being 
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developed at Fujitsu [1], The design has about 350 symbolic variables which correspond 
to about 500 latches. Before using our methodology, we implemented the cone of influ- 
ence reduction [8] in NuSMV to enhance its ability to check large models. Neither our 
enhanced version of NuSMV nor the recent version of SMV developed by Yang [23] 
were able to verify the Fujitsu IP core design. Flowever, by using our new technique, we 
were able to find a subtle error in the design. Our program automatically abstracted 144 
symbolic variables and performed three refinement steps. Currently, we are evaluating 
the methodology on other complex industrial designs. 

The paper is organized as follows: Section 2 gives the basic definitions and termi- 
nology used throughout the paper. A general overview of our methodology is given in 
Section 3. Detailed descriptions of our abstraction-refinement algorithms are provided 
in Section 4. Performance improvements for the implementation are described in Sec- 
tion 5. Experimental results are presented in Section 6. Future research is discussed in 
Section 7. 



2 Preliminaries 

A program P has a finite set of variables V = {ui, • • • ,Un}, where each variable 
Vi has an associated finite domain . . The set of all possible states for program P 
is X • • • Dy^ which we denote by D. Expressions are built from variables in V, 
constants in Dy., and function symbols in the usual way, e.g. v\ + 3. Atomic formulas 
are constructed from expressions and relation symbols, e.g. wi + 3 < 5. Similarly, 
predicates are composed of atomic formulas using negation (-■), conjunction (A), and 
disjunction (V). Given a predicate p, Atoms(p) is the set of atomic formulas occurring 
in it. Let p be a predicate containing variables from V, and d = {di, . . . ,dn) be an 
element from D. Then we write d\= p when the predicate obtained by replacing each 
occurrence of the variable Vi in p by the constant di evaluates to true. 

Each variable Vi in the program has an associated transition block, which defines 
both the initial value and the transition relation for the variable Vi. An example of a 
transition block for the variable Vi is shown in Eigure 1, where f C Dy^ is the initial 



init(ui) := 


h\ 


initfa:) := 0; 


init(y) := 1; 


next(ui) := 


case 


nextfa;) := case 


next(j/) := case 




Cl:Aj; 


reset = TRUE : 0; 


reset = TRUE : 0; 




Cf-.Al, 


X < y: X 1', 


H 

II 

> 

II 




... : • • • ; 


X = y: 0; 


ix = y): 0; 




/^k . Ak . 


else : x; 


else : y; 


esac; 




esac; 


esac; 



Fig. 1. A generic transition block and a typical example 



expression for the variable Vi, each condition Cl is a predicate, and A-l is an expression. 
The semantics of the transition block is similar to the semantics of the case statement 
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in the modeling language of SMV, i.e., find the least j such that in the current state 
condition Cf is true and assign the value of the expression A\ to the variable Vi in the 
next state. 

We assume that the specifications are written in a fragment of CTL* called ACTL* 
(see [10]). Assume that we are given an ACTL* specification tp, and a program P. For 
each transition block Bi let Atoms (Si) be the set of atomic formulas that appear in the 
conditions. Let Atoms((/?) be the set of atomic formulas appearing in the specification 
p. Atoms(P) is the set of atomic formulas that appear in the specification or in the 
conditions of the transition blocks. 

Each program P naturally corresponds to a labeled Ar/pke itrMctwre M = (S', I, R, L), 
where S = D is the set of states, I C S is a set of initial states, P C S x S is a transition 
relation, and L : S ^ is a labelling given by L{d) = {/ G Atoms(P) | 

d \= f}. Translating a program into a Kripke structure is straightforward and will not 
be described here. 

An abstraction h for a program P is given by a surjection h : D ^ D. Notice that 
the surjection h induces an equivalence relation = on the domain D in the following 
manner: let d, e be states in D, then 

d=e iff h{d) = h{e). 

Since an abstraction can be represented either by a surjection h or by an equivalence rela- 
tion =, we sometimes switch between these representations to avoid notational overhead. 

Assume that we are given a program P and an abstraction function h for P. The 
abstract Kripke structure M = (S', /, R, L) corresponding to the abstraction function h 
is defined as follows: 

1. S is the abstract domain P. 

2. T{dliff3d{h{d) = dM{d)). ^ 

3. R{di,d2) i& 3di3d2{h{di) = di A h{d2) = d2 A R{di,d2)). 

4. L{d) = U/i(d)=d (This definition will be justified in Theorem 1.) 

This abstraction technique is called existential abstraction [8]. An atomic formula 
/ respects an abstraction function h if for all d and d' in the domain D, {d = d') 

{d \= f ^ d' \= /). Let d be an abstract state. L{d) is consistent, if all concrete 
states corresponding to d satisfy all labels in L{d), i.e., for all d G h~^{d) it holds that 

^ H A/gL(5) /■ 

Theorem 1. Let h be an abstraction and p be an ACTL* specification where the atomic 
subformulas respect h. Then the following holds: (i) L(d) is consistent for all abstract 
states d in M; (ii) M \= p M \= p. 

In other words, correctness of the abstract model implies correctness of the concrete 
model. On the other hand, if the abstract model invalidates an ACTL* specification, i.e., 
M p, the actual model may still satisfy the specification. 

Example 1. Assume that for a traffic light controller (see Figure 2), we want to prove ip = 
AG AF{state = red) using the abstraction function h{red) = red and h{green) = 
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h{yellow) = go. It is easy to see that M \= -tf; while M ^ -ijj. There exists an infinite 
trace {red, go, go, . . .) that invalidates the specification. 




Fig. 2. Abstraction of a Traffic Light. 



If an abstract counterexample does not correspond to some concrete counterexample, 
we call it spurious. For example, {red, go, go, . . . ) in the above example is a spurious 
counterexample. 

When the set of possible states is given as the product Di x • • • of smaller 
domains, an abstraction h can be described by surjections hi \ Di ^ Di, such that 
h{di, . . . , dn) is equal to {hi{di), . . . , hn{dn)), and D is equal to Di x • • • £)„. In 
this case, we write h = {hi, . . . , hn). The equivalence relations =i corresponding to 
the individual surjections hi induce an equivalence relation = over the entire domain 
D = Di X • • • X Dn in the obvious manner: 

(^ 1 7***7 dn{ = (ci 7***7 Cn) iff =1 Cl A * * * A dn =n 

In previous work on existential abstraction [10], abstractions were defined for each 
variable domain, i.e., Di in the above paragraph was chosen to be Dy^, where Dy. is 
the set of possible values for variable Vi. Unfortunately, many abstraction functions h 
can not be described in this simple manner. For example, let 19 = {0,1,2} x {0,1,2}, 
and D = {0, 1} x {0, 1}. Then there are 4® = 262144 functions h from D to D. Next, 
consider h = {hi, h 2 ). Since there are 2^ = 8 functions from {0, 1, 2} to {0, 1}, there 
are only 64 functions of this form from D to D. 

In this paper, we define abstraction functions in a different way. We partition the 
set V of variables into sets of related variables called variable clusters VCi, . . . ,V Cm, 
where each variable cluster VCi has an associated domain DyCi ■= rii 7 eyc ^«* 
Consequently, D = DyCi x • • • DyCm ■ define abstraction functions as surjections 
on the domains DyCt , i c-, in the above paragraph is equal to DyCi ■ Thus, the notion 
of abstraction used in this paper is more general than the one used in [10]. 

3 Overview 

For a program P and an ACTL* formula ip, our goal is to check whether the Kripke 
structure M corresponding to P satisfies ip. Our methodology consists of the following 
steps. 

1 . Generate the initial abstraction: We generate an initial abstraction h by examining 
the transition blocks corresponding to the variables of the program. We consider 
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the conditions used in the case statements and construct variable clusters for varia- 
bles which interfere with each other via these conditions. Details can be found in 
Section 4.1. 

2. Model-check the abstract structure: Let M be the abstract Kripke structure corre- 
sponding to the abstraction h. We check whether M \= ip. If the check is affirmative, 
then we can conclude that M \= p (see Theorem 1). Suppose the check reveals that 
there is a counterexample T. We ascertain whether T is an actual counterexample, 
i.e., a counterexample in the unabstracted structure M. If T turns out to be an actual 
counterexample, we report it to the user, otherwise T is a spurious counterexample, 
and we proceed to step 3. 

3. Refine the abstraction: We refine the abstraction function h by partitioning a single 
equivalence class of = so that after the refinement the abstract structure M corre- 
sponding to the refined abstraction function does not admit the spurious counterex- 
ample T. We will discuss partitioning algorithms for this purpose in Section 4.3. 
After refining the abstraction function, we return to step 2. 



4 The Abstraction-Refinement Framework 

4.1 Generating the Initial Abstraction 

Assume that we are given a program P with n variables {wi , • • • , Given an atomic 
formula /, let var{f) be the set of variables appearing in /, e.g., var{x = y) is {x, y}. 
Given a set of atomic formulas U, var{U) equals U/ec/ general, for any 

syntactic entity X, var(X) will be the set of variables appearing in X. We say that two 
atomic formulas fi and /2 interfere iff war ( /i ) fl var ( /2 ) 0 . Let = / be the equivalence 

relation on Atoms(P) that is the reflexive, transitive closure of the interference relation. 
The equivalence class of an atomic formula / G Atoms(P) is called the formula cluster 
of / and is denoted by [/]. Let fi and /2 be two atomic formulas. Then var{fi) fl 
i'ar(/ 2 ) f 0 implies that [/i] = [/ 2 ]. In other words, a variable Vi cannot appear in 
formulas that belong to two different formula clusters. Moreover, the formula clusters 
induce an equivalence relation =y on the set of variables V in the following way: 

Vi =v Vj if and only if Vi and vj appear in atomic formulas that belong to the 

same formula cluster. 

The equivalence classes of =y are called variable clusters. For instance, consider a for- 
mula cluster FCi = {rii > 3, = ^ 2 }. The corresponding variable cluster isVCi = 

{vi,V 2 }. Let {FCi , . . . , FCm} be the set of formula clusters and {VC \, . . . , VCm} 
the set of corresponding variable clusters. We construct the initial abstraction h = 
{hi , . . . , hm) as follows. For each hi, we set DyCi — Y\v^vc i-C-. DyCi is the 
domain corresponding to the variable cluster VCi. Since the variable clusters form a 
partition of the set of variables V, it follows that D = DyCi x • • • DyCm ■ Po'' 
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variable cluster VCi = {vi^, . . . ,Vi^}, the corresponding abstraction hi is defined on 
DvCi as follows. 



hi{di, • • • , dk) = hi{ei, • • • , e^) iff for all atomic formulas / G FCi, 

(<ii , • • • , life) 1= / (ei , • • • , Cfc) 1= /■ 



In other words two values are in the same equivalence class if they cannot be “distinguis- 
hed” by atomic formulas appearing in the formula cluster FCi. The following example 
illustrates how we construct the initial abstraction h. 

Example 2. Consider the program P with three variables x,y G {0, 1, 2}, and reset G 
{TRUE, FALSE} shown in Figure 1. The set of atomic formulas is Atoms(P) = 
{{reset = TRUE), (x = y), (x < y), {y = 2)}.Therearetwoformulaclusters,FCi = 
{(x = y),{x < y),{y = 2)} and FC 2 = {{reset = TRUE)}. The corresponding 
variable clusters are {x,y} and {reset}, respectively. Consider the formula cluster F Ci . 
Values (0, 0) and (1, 1) are in the same equivalence class because for all the atomic 
formulas / in the formula cluster FCi it holds that (0, 0) |= / iff (1, 1) }= /. It can be 
shown that the domain |0,l,2}x{0,l,2}is partitioned into a total of five equivalence 
classes by this criterion. We denote these classes by the natural numbers 0, 1, 2, 3, 4, and 
list them below: 



0 = 1 ( 0 , 0 ), ( 1 , 1 )}, 

1 = {( 0 , 1 )}, 

2 = 1 ( 0 , 2 ), ( 1 , 2 )}, 

3 = {(1,0),(2,0),(2,1)}, 

4= 1(2,2)} 

The domain {TRUE, FALSE} has two equivalence classes - one containing FALSE 
and the other TRUE. Therefore, we define fwo absfracfion functions hi : {0, 1, 2}^ — 
{0,1, 2, 3, 4} and /12 : {TRUE, FALSE} ^ {TRUE, FALSE}. The first function 
hi is given by /ii(0,0) = ft-i(l,l) = 0, fii(0, 1) = 1, ft-i(0,2) = hi{l,2) = 2, 
ft.i(l,0) = ft-i(2, 0) = /ii(2,l) = 3, hi{2,2) = 4. The second function /12 is just the 
identity function, i.e., h 2 {reset) = reset. Given the abstraction functions, we use the 
standard existential abstraction techniques to compute the abstract model. 



4.2 Model Checking the Abstract Model 

Given an ACTL* specification (p, an abstraction function h (assume that ip respects 
h), and a program P with a finite set of variables V = {vi, • • • , n„}, let M be the 
abstract Kripke structure corresponding to the abstraction function h. We use standard 
symbolic model checking procedures to determine whether M satisfies fhe specification 
p. If it does, then by Theorem 1 we can conclude that the original Kripke structure 
also satisfies p. Otherwise, assume that the model checker produces a counterexample 
T corresponding to the abstract model M. In the rest of this section, we will focus on 
counterexamples which are either (finite) paths or loops. 
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Identification of Spurious Path Counterexamples First, we will tackle the case when 
the counterexample T is a path (si , • • • , 's^) . Given an abstract state s, the set of concrete 
states s such that h(s) = s"is denoted by /i“^(s'),i.e.,h“^(s) = {s|h(s) = s"}.Weextend 
h~^ to sequences in the following way: h~^{T) is the set of concrete paths given by the 
following expression 

n n—1 

* * * 5 h(^Si^ — Si A /(si) A R(^Sij 

We will occasionally write to emphasize the fact that h~^ is applied to a sequence. 

Next, we give a symbolic algorithm to compute h~^{T). Let S'! = fl I and 

R be the transition relation corresponding to the unabstracted Kripke structure M. For 
1 < i < n, we dehne Si in the following manner: Si := Img{Si-i, R) fl h~^{si). 
In the definition of Si, Img{Si-i, R) is the forward image of Si-i with respect to the 
transition relation R. The sequence of sets Si is computed symbolically using OBDDs 
and the standard image computation algorithm. The following lemma establishes the 
correctness of this procedure. 

Lemma 1. The following are equivalent: 

(i) The path T corresponds to a concrete counterexample. 

( ii) The set of concrete paths h~^(T) is non-empty. 

(Hi) For all 1 < i < n. Si ih. 




Algorithm SplitPATH 

s-.= h \si)ni 

j ~ 1 

while (5/0 and j < n) { 

j := j + 1 

5prev . — S 

S := lmg{S,R)nh ^s)) } 
if S / 0 then output counterexample 

else output j, 5prev 



Fig. 3. An abstract counterexample 



Fig. 4. SplitPATH checks spurious path. 



Example 3. Consider a program with only one variable with domain D = {1, - ■ ■ ,12}. 
Assume that the abstraction function h maps x G D to [(x — 1) /3J + 1. There are four 
abstract states corresponding to the equivalence classes {1, 2,3}, {4, 5, 6 }, {7, 8 , 9}, and 
{10, 11, 12}. We call these abstract states 1, 2, 3, and 4. The transitions between states 
in the concrete model are indicated by the arrows in Figure 3; small dots denote non- 
reachable states. Suppose that we obtain an abstract counterexample T = (1, 2, 3, 4) . It is 
easy to see that Lis spurious. Using the terminology of Lemma 1, we haves'! = (1,2, 3}, 
S 2 = (4, 5, 6 }, S 3 = {9}, and S 4 = 0. Notice that S 4 and therefore Img{Ss, R) are 
both empty. 
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It follows from Lemma 1 that if h~^{T) is empty (i.e., if the counterexample T 
is spurious), then there exists a minimal i (2 < i < n) such that Si = 0. The sym- 
bolic Algorithm SplitPATH in Figure 4 computes this number and the set of states in 
Si-i- In this case, we proceed to the refinement step (see Section 4.3). On the other 
hand, if the conditions stated in Lemma 1 are true, then SplitPATH will report a “real” 
counterexample and we can stop. 

Identification of Spurious Loop Counterexamples Now we consider the case when the 
counterexample T includes a loop, which we write as (s) , • • • , s)) , • ■ ’ : • The 

loop starts at the abstract state and ends at 's^. Since this case is more complicated 
than the path counterexamples, we first present an example in which some of the typical 
situations occur. 

Example 4. We consider a loop (si)(s 2 , s'^)^ as shown in Figure 5. In order to find out 
if the abstract loop corresponds to concrete loops, we unwind the counterexample as 
demonstrated in the figure. There are two situations where cycles occur. In the figure. 





Fig. 5. A loop counterexample, and its unwinding. 



for each of these situations, an example cycle (the first one occurring) is indicated by 
a fat dashed arrow. We make the following important observations: (i) A given abstract 
loop may correspond to several concrete loops of different size, (ii) Each of these loops 
may start at different stages of the unwinding, (iii) The unwinding eventually becomes 
periodic (in our case = S'!), but only after several stages of the unwinding. The size 
of the period is the least common multiple of the size of the individual loops, and thus, 
in general exponential. 

We conclude from the example that a naive algorithm may have exponential time com- 
plexity due to an exponential number of loop unwindings. The following surprising 
result shows that for T = (si , • • • , s)) , • ■ ’ ? the number of unwindings can 

be bounded by min = min |h“^(sj)|, i.e., the number of unwindings is at most the 

number of concrete states for any abstract state in the loop. Let Tunwind denote this un- 
winded loop counterexample, i.e., the finite abstract path (s)^, . . . , s)) . . . , 

Then the following theorem holds. 

Theorem 2. The following are equivalent: (i) T corresponds to a concrete counterex- 
ample. (ii) /ipath (^unwind) is not empty. 
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It can be seen from Example 4 that loop counterexamples are combinatorially more com- 
plicated than path counterexamples. Therefore, the proof of Theorem 2 is not immediate; 
for details, we refer to [7]. We conclude from Theorem 2 that the Algorithm SplitPATH 
can be used to analyze abstract loop counterexamples with minor modifications. For 
easy reference we shall refer to this algorithm as SplitLOOP. 

4.3 Refining the Abstraction 

First, we will consider the case when the counterexample T = (si, • • • , s^) is a path. 
Since T does not correspond to a real counterexample, by Femma 1 (iii) there exists a 
set Si C with 1 < i < n such that Img{Si, R) fl h~^{si^i) = 0 and Si is 

reachable from initial state set (T /. Since there is a transition from Si to 

in the abstract model, there is at least one transition from a state in h~^{si) to a state in 
h~^{si^i) even though there is no transition from Si to We partition h~^{si) 

into three subsets Si^o, Si^i, and Si^x as follows (compare Figure 6 ): 

S Q = S- 

Si^i = {s G G h~'^{s^i).R{s,s')} 

S^,x = h-^{si)\{S,,o^S,,l). 

Intuitively, Si^ denotes the set of states in h~^{si) that are reachable from initial states. 
Si^i denotes the set of states in h~^{si) that are not reachable from initial states, but 
have at least one transition to some state in The set 5^1 cannot be empty 

since we know that there is a transition from h~^{si) to Si^x denotes the 

set of states that are not reachable from initial states, and do not have a transition to a 
state in For illustration, consider again the example in Figure 3. Note that 

S'! = {1, 2, 3}, S 2 = {4, 5, 6 }, S 3 = {9}, and S 4 = 0. Using the notation introduced 
above, we have S '3 0 = {9}, 5'a 1 = {7}, and S^^x = { 8 }- Since S'i 1 is not empty, there 
is a spurious transition Si . This causes the spurious counterexample T. Hence 

in order to refine the abstraction h so that the new model does not allow T, we need a 
refined abstraction function which separates the two sets S'i 0 and i.e., we need an 
abstraction function, in which no abstract state simultaneously contains states from Si^ 
and from Si^i. 

It is natural to describe the needed refinement in terms of equivalence relations: 
Recall that h~^{s) is an equivalence class of = which has the form Ei x ■ ■ ■ x E^, 
where each Ei is an equivalence class of =j. Thus, the refinement =' of = is obtained by 
partitioning the equivalence classes Ej into subclasses, which amounts to refining the 
equivalence relations =j . The size of the refinement is the number of new equivalence 
classes. Ideally, we would like to find the coarsest refinement that separates the two sets, 
i.e., the separating refinement with the smallest size. We can show however that this is 
computationally intractable. 

Theorem 3. (i) The problem of finding the coarsest refinement is NP-hard; (ii) when 
Si^x = 0, the problem can be solved in polynomial time. 

We find that the previously known poblem PARTITION INTO CFIQUES can be 
reduced to the coarsest refinement problem. The proof is omitted due to space restrictions. 
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On the other hand, we describe a polynomial time algorithm PolyReflne corresponding 
to case (ii) of Theorem 3 in Figure 7. Let , P~ be two projection functions, such that 
fors = (di, . . . ,dm), P+(s) = dj andP“(s) = (di, . . . , dj_i, d^+i, . . . , d™). Then 
proj{Sifi,j,a) denotes the projection set {Pj~(s)|P^i^(s) = a,s G Sifi}. Intuitively, 
the condition proj(S'i, 0 ) 7 , a) Pfoj{Sifi,j, b) in the algorithm means that there exists 

(di , . . . , dj— 1 , dj-i-i , . . . , dm) G pvoj ,7) tt) and (di , . . . , _i , dj+i , . . . , dm) ^ 
proj{Sifi,j, b). According to the definition of proj{Sifl,j, a), si = {di, . . . , dj-i,a, 
dj+i, ■ ■ ■ ydm) G 5'i,o S2 = {di, . . . ,dj-i,b,dj^i, . . . ,dm) ^ 5'i,o, i-®-, S2 G 
Si^i. Note that si and S 2 are only different at j-th component. Hence, the only way to 
separate si and S 2 into different equivalence classes is that a and b have to be in different 
equivalence classes of =' , i.e., a b. 



Lemma 2. When Si^x = 0, the relation ='■ computed by PolyReflne is an equivalence 
relation which refines =j and separates Sip and Sip. Furthermore, the equivalence 
relation =t is the coarsest refinement of =j. 



Note that in symbolic presentation, the projection operation proj{Sip,j, a) amo- 
unts to computing a generalized cofactor, which can be easily done by standard BDD 
methods. Given a function / : D — >■ {0,1}, a generalized cofactor of / with respect to 
g — tttk — df ^ ) is the function f g — f (xi , . . . , Xp— i , dp , . ■ . , dq , , . . . , Xjf ) . 

In other words, fg is the projection of / with respect to g. Symbolically, the set Sip is re- 
presented by a function ^ : 77 — >■ {0, 1}, and therefore, the projection proj^Si^o, 7) a) 
of Sip to value a of the jth component corresponds to a cofactor of fsi „■ 




Algorithm PolyReflne 
for j := 1 to m { 

=j ■= =3 

for every a,b £ Ej { 

\tproj{Sip,j, a) -f proj{Sip,j, b) 
then = -= \{(a,6)} }} 



Fig. 6. Three sets Sip, Sip, and Si^x 



Fig. 7. The algorithm PolyReflne 



In our implementation, we use the following heuristics: We merge the states in Si^x 
into Sip, and use the algorithm Polyreflne to find fhe coarsest refinement that separates 
the sets Sip and Sip U S'j j,. The equivalence relation computed by PolyReflne in this 
manner is not optimal, but it is a correct refinement which separates Sip and Sip, and 
eliminates the spurious counterexample. This heuristic has given good results in our 
practical experiments. 

Since according to Theorem 2, the algorithm SplitLOOP for loop counterexamples 
works analogously as SplitPATH, the refinement procedure for spurious loop coun- 
terexamples works analogously, too. Details are omitted due to space restrictions. Our 
refinement procedure continues to refine fhe absfraction function by partitioning equi- 
valence classes until a real counterexample is found, or the ACTL* property is verified. 
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The partitioning procedure is guaranteed to terminate since each equivalence class must 
contain at least one element. Thus, our method is complete. 

Theorem 4. Given a model M and an ACTL* specification ip whose counterexample 
is either path or loop, our algorithm will find a model M such that M \= ip M \= p. 



5 Performance Improvements 

The symbolic methods described in Section 4 can be directly implemented using BDDs. 
Our implementation uses additional heuristics which are outlined in this section. For 
details, we refer to our technical report [7]. 



1 2 





3 | 



Fig. 8. A spurious loop counterexample (1, 2)" 



Two-phase Refinement Algorithms. Consider the spurious loop counterexample T = 
(1, 2)‘^ of Figure 8. Although T is spurious, the concrete states involved in the example 
contain an infinite path (1,1,...) which is a potential counterexample. Since we know 
that our method is complete, such cases could be ignored. Due to practical performance 
considerations, however, we came to the conclusion that the relatively small effort to 
detect additional counterexamples is justified as a valuable heuristic. For a general loop 
counterexample T = (s"i, . . . , Si)(si+i, . . . , s„)“, we therefore proceed in two phases: 

(i) We restrict the model to the state space S'locai := (Ui<i<n of the counter- 

example and use the standard fixpoint computation for temporal formulas (see e.g. [8]) 
to check the property on the Kripke structure restricted to S'locai- If a concrete counter- 
example is found, then the algorithm terminates. 

(ii) If no counterexample is found, we use SplitLOOP and PolyRefine to compute a 
refinement as described above. 

This two-phase algorithm is slightly slower than the original one if we do not find a con- 
crete counterexample; in many cases however, it can speed up the search for a concrete 
counterexample. An analogous two phase approach is used for finite path counterexam- 
ples. 

Approximation. Despite the use of partitioned transition relations it is often infeasible 
to compute the total transition relation of the model M [8]. Therefore, the abstract 
model M cannot be computed from M directly. In previous work [2,10], a method 
which we call early approximation has been introduced: first, abstraction is applied to 
the BDD representation of each transition block and then the BDDs for the partitioned 
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transition relation are built from the already abstracted BDDs for the transition blocks. 
The disadvantage of early approximation is that it over-approximates the abstract model 
M [9] . In our approach, a heuristic individually determines for each variable cluster V Ci , 
if early approximation should be applied or if the abstraction function should be applied 
in an exact manner. Our method has the advantage that it balances overapproximation and 
memory usage. Moreover, the overall method presented in our paper remains complete 
with this approximation. 

Abstractions For Distant Variables. In addition to the methods of Section 4.1, we 
completely abstract variables whose distance from the specification in the variable de- 
pendency graph is greater than a user-defined constant. Note that the variable dependency 
graph is also used for this purpose in the localization reduction [2, 14, 17] in a similar way. 
However, the refinement process of the localization reduction [14] can only turn a com- 
pletely abstracted variable into a completely unabstracted variable, while our method 
uses intermediate abstraction functions. 



6 Experimental Results 

We have implemented our methodology in NuSMV [6] which uses the CUDD pack- 
age [21] for symbolic representation. We performed two sets of experiments. One set is 
on five benchmark designs. The other was performed on an industrial design of a mul- 
timedia processor from Fujitsu [1]. All the experiments were carried out on a 200MHz 
PentiumPro PC with 1GB RAM memory using Linux. 

The first benchmark designs are publicly available. The PCI example is extracted 
from [5]. The results for these designs are listed in the table. 



Design 


#Var 


#Prop 


1 NuSMV+COI 


NuSMV-i-ABS 1 








#COI 


Time 


\TR\ 


\MC\ 


#ABS 


Time 


\TR\ 


\MC\ 


gigamax 


10(16) 


1 


0 


0.3 


8346 


1822 


9 


0.2 


13151 


816 


guidance 


40(55) 


8 


30 


35 


140409 


30467 


34-39 


30 


147823 


10670 


p-queue 


12(37) 


1 


4 


0.5 


51651 


1155 


5 


0.4 


52472 


1114 


waterpress 


6(21) 


4 


0-1 


273 


34838 


129595 


4 


170 


38715 


3335 


PCI bus 


50(89) 


10 


4 


2343 


121803 


926443 


12-13 


546 


160129 


350226 



In the table, the performance for an enhanced version of NuSMV with cone of influence 
reduction (NuSMV + COI) and our implementation (NuSMV + ABS) are compared. 
#Var and #Prop are properties of the designs: #Var = x{y) means that x is the number 
of symbolic variables, and y the number of Boolean variables in the design. #Prop is 
the number of verified properties. The columns #COI and #ABS contain the number 
of symbolic variables which have been abstracted using the cone of influence reduction 
(#COI), and our initial abstraction (#ABS). The column ’’Time” denotes the accumulated 
running time to verify all #Prop properties of the design. \TR\ denotes the maximum 
number of BDD nodes used for building the transition relation. \MC\ denotes the ma- 
ximum number of additional BDD nodes used during the verification of the properties. 
Thus, |Ti?| + |MC| is the maximum BDD size during the total model checking process. 
For the larger examples, we use partitioned transition relations by setting the BDD size 
limit to 10000. 
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Although our approach in one case uses 50% more memory than the traditional cone 
of influence reduction to build the abstract transition relation, it requires one magnitude 
less memory during model checking. This is an important achievement since the model 
checking process is the most difficult task in verifying large designs. More significant 
improvement is further demonstrated by the Fujitsu IP core design. 

The Fujitsu IP core design is a multimedia assist (MMA-ASIC) processor [1]. The 
design is a system-on-a-chip that consists of a co-processor for multimedia instructions, 
a graphic display controller, peripheral I/O units, and five bus bridges. The RTL imple- 
mentation of MM- ASIC is described in about 61,500 lines of Verilog-HDL code. After 
manual abstraction by engineers from Fujitsu in [22], there still remain about 10,600 
lines of code with roughly 500 registers. We translated this abstracted Verilog code into 
9,500 lines of SMV code. In [22], the authors verified this design using a "navigated" 
model checking algorithm in which state traversal is restricted by navigation conditions 
provided by the user. Therefore, their methodology is not complete, i.e., it may fail to 
prove the correctness even if the property is true. Moreover, the navigation conditions 
are usually not automatically generated. 

In order to compare our model checker to others, we tried to verify this design using 
two state-of-the-art model checkers -Yang’s SMV [23] andNuSMV [6] . We implemented 
the cone of influence reduction for NuSMV, but not for Yang’s SMV. BothNuSMV-tCOI 
and Yang’s SMV failed to verify the design. On the other hand, our system abstracted 
144 symbolic variables and with three refinement steps, successfully verified the design, 
and found a bug which has not been discovered before. 

7 Conclusion and Future Work 

We have presented a novel abstraction refinement methodology for symbolic model 
checking. The advantages of our methodology have been demonstrated by experimental 
results. We believe that our technique is general enough to be adapted for other forms of 
abstraction. There are many interesting avenues for future research. First, we want to find 
efficient approximation algorithms for the NP-complete separation problem encountered 
during the refinement step. Moreover, in a recent paper [4], the fragment of ACTL* that 
admits “trace”-like counterexamples (of a potentially more complicated structure than 
paths and loops) has been characterized; we plan to extend our refinement algorithm 
to this language. Since the symbolic methods described in this paper are not tied to 
representation by BDDs, we will also investigate how they can be applied to recent work 
on symbolic model checking without BDDs [3] . We are currently applying our technique 
to verify other large examples. 
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Abstract. We show how alternating automata provide decision proce- 
dures for the equivalence of inductively defined Boolean functions that 
are useful for reasoning about parameterized families of circuits. We use 
alternating word automata to formalize families of linearly structured 
circuits and alternating tree automata to formalize families of tree struc- 
tured circuits. We provide complexity bounds and show how our decision 
procedures can be implemented using BDDs. In comparison to previous 
work, our approach is simpler, yields better complexity bounds, and, in 
the case of tree structured families, is more general. 



1 Introduction 

Reasoning about parametric system descriptions is important in building scala- 
ble systems and generic designs. In hardware verification, the problem arises in 
verification of parametric combinational circuit families, for example, proving 
that circuits in one family are equivalent to circuits in another, for every pa- 
rameter value. Another application of parametric reasoning is in establishing 
properties of sequential circuits, where time is the parameter considered. In this 
paper we present a new approach to these problems based on alternating auto- 
mata on words and trees. 

The starting point for our research is the work of Gupta and Fisher [6,7] . They 
developed a formalism for describing circuit families using one of two kinds of 
inductively defined Boolean functions. The first, called Linearly Inductive Boo- 
lean Functions, or LIFs, formalizes families of linearly structured circuits. The 
second, called Exponentially Inductive Boolean Functions, or EIFs, models fami- 
lies of tree structured circuits. As simple examples, consider the linear (serial) 
and tree structured d-bit parity circuits described by the following diagrams. 

*1 
h 

serial _parity ^3 



A LIF describing the general case of the linear circuit is given by the following 
equations. (We will formally introduce slightly different syntax in §3 and §4.) 

seriaLparity^ (bi) = bi 

seriaLparity'^ {bi , . . . , bn) = © seriaLparity^~^ {b\, . . . , 5„_i) for n > 1 . 

E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 170-185, 2000. 
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Similarly, an EIF describing the family of tree-structured parity circuits is: 
tree jparity^ {hi) = h\ 

tree-parity^ (&i, . . . , b 2 ^) = tree-parity^ {bi, . . . , b 2 « -i) © 

tree-parity^ ( 62 ™ 1 + 1 , ■ • ■ , & 2 ”) for n > 1 . 



Gupta and Fisher developed algorithms to translate these descriptions into 
novel data-structures that generalize BDDs (roughly speaking, their data-struc- 
tures have additional pointers between BDDs, which formalize recursion). The 
resulting data-structures are canonical: different descriptions of the same family 
are converted into identical data-structures. This yields a decision procedure 
both for the equivalence of LIFs and for FIFs. 

Motivated by their results, we take a different approach. We show how LIFs 
and FIFs can be translated, respectively, into alternating word and tree auto- 
mata, whereby the decision problems for LIFs and FIFs are solvable by automata 
calculations. For LIFs, the translation and decision procedure are quite direct 
and may be implemented and analyzed using standard algorithms and results 
for word automata. For FIFs, the situation is more subtle since input is given 
by trees where only leaves are labeled by data and we are only interested in 
the equality of complete trees. Here, we decide equality using a procedure that 
determines whether a tree automaton accepts a complete leaf-labeled tree. 

The use of alternating automata has a number of advantages. First, it gives 
us a simple view of (and leads to simpler formalisms for) LIFs and FIFs based on 
standard results from automata theory. For example, the expressiveness of these 
languages trivially falls out of our translations: LIFs describe regular languages 
on words and FIFs describe regular languages on trees (modulo the subtleties 
alluded to above). Hence, LIFs and FIFs can formalize any circuit family whose 
behavior is regular in the language theoretic sense. Second, it provides a handle 
on the complexity of the problems. For LIFs we show that the equality problem 
is PSPACF-complete and for FIFs it is in FXP SPACE. The result for LIFs 
represents a doubly exponential improvement over the previous results of Gupta 
and Fisher and our results for FIFs, are to our knowledge, the first published 
bounds for this problem. Finally, the use of alternating automata provides a 
basis for adapting data-structures recently developed in the Mona project [10]; 
in their work, as well as ours, BDDs are used to represent automata and can often 
exponentially compress the representation of the transition function. We show 
that the use of BDDs to represent alternating automata offers similar advantages 
and plays an important role in the practical use of these techniques. 

We proceed as follows. In §2 we provide background material on word and 
tree automata. In §3 and §4 we formalize LIFs and FIFs and explain our decision 
procedures. In §5 we make comparisons and in §6 draw conclusions and discuss 
future work. 
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2 Background 

Boolean Logic The set B{V) of Boolean formulae (over F j is built from the 
constants 0 and 1, variables v G V, and the connectives -i, V, A, O and ©. 
For (3 G B{V), /3[ai/ui, . . . , a„/u„] denotes the formula where the Vi G V are 
simultaneously replaced by the formulae Oj G B(V). 

Boolean formulae are interpreted in the set B = {0, 1} of truth values. A 
substitution is a function ct : 1^ — 1 B that is homomorphically extended to B(V). 
For cr : y — >■ B and (3 G B{V) we write a \= (3 ii cr{P) = 1. We will sometimes 
identify a subset M of V with the substitution ctm : y — >■ B, where <tm{v) = 1 
iff u G M. For example, for the formula v\ © V 2 , we have {vi} |= wi © U 2 but 
'fvi® V2- 



Words and Trees E* is the set of all words over the alphabet E. We write A for 
the empty word and E~^ for E* \ {A}. For u,v € E*, u.v denotes concatenation, 
|u| denotes m’s length, and denotes the reversal of u. 

A E -labeled tree (with branching factor r G N) is a function t where the range 
of t is A7 and the domain of t, dom(t) for short, is a finite subset of {0, . . . , r — 1}* 
where (i) dom(t) is prefix closed and (ii) if u.i G dom(t), then u.j G dom(t) for 
all j < i. The elements of dom(t) are called nodes and A G dom(t) is called the 
root. The node u.i G dom(t) is a successor of u. A node is an inner node if it has 
successors and is a leaf otherwise. The height of t is |t| = max({0} U {|u| + 1 1 u G 
dom(t)}). The depth of a node u G dom(t) is the length of u. 

A tree is complete if all its leaves have the same depth. The frontier of t 
is the word front(t) G E* where the tth letter is the label of the tth leaf in t 
(from the left). E"^* denotes the set of all binary A7-labeled trees and E'^^ is 
E"^* without the empty tree. 



Nondeterministic Automata A nondeterministic word automaton (NWA) 
A is a tuple {E, Q, go, F,S), where A is a nonempty finite alphabet, Q is a 
nonempty finite set of states, go G <5 is the initial state, F C Q is a set of 
accepting states, and 6 : Q x E ^ B{Q) is a transition function. A run of A 
on a word ic = ai . . . a„ G A* is a word ir = si . . . s„+i G with si = go and 
Si+i G S(si,ai) for 1 < t < n. tt is accepting if s„+i G F. A word w is accepted 
by A if there is an accepting run of A on w, L{A) denotes the set of accepted 
words. 

Nondeterministic (top-down, binary) tree automata (NTA) are defined ana- 
logously: A is a tuple {E,Q,qo,F,S), where E, Q, go and F are as before. The 
transition function is <5 : QxE V{QxQ). N run of a NTA A on a tree t G E"^* 
is a tree tt G where dom(7r) = {A} U {w.b \ w G dom(t) and b G {0, 1}}. 

Moreover, 7 t(A) = go and for w G dom(t), (7 t(w.O), 7r(w.l)) G S{TT{w),t{w)). The 
run 7T is accepting if tt(w) G F for any leaf w G dom(7r). A tree t is accepted by 
A if there is an accepting run of A on t; F(A) denotes the set of accepted trees. 
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NWAs and NTAs recognize the regular word and tree languages and are 
effectively closed under intersection, union, complement and projection. For a 
detailed account of regular word and tree languages see [11] and [4] respectively. 



Alternating Automata Alternating automata for words were introduced in [2, 
3] and for trees in [15]. We use the definition of alternating automata for words 
from [17] and generalize it to trees. For this we need the notion of the positive 
Boolean formulae: Let B^{V) be the set of Boolean formulae built from 0, 1, 
V G V, and the connectives V and A. 

An alternating word automaton (AWA) is of the form A = {E,Q,qQ,F,6) 
where everything is as before, except for the transition function 5 : Q x S ^ 
B^{Q). The same holds for alternating tree automata (ATA) where the only 
difference is the transition function 5 : Q x A — >■ B^{Q x {L, R}). We write 
for (q,X) G Q X {L,R}. 

We will only define a run for an ATA; the restriction to AWA is straightfor- 
ward. For an ATA, a run tt oi A on t G is a Q x {0, l}*-labeled tree, with 
7 t(A) = {qo, A). Moreover, for each node w G dom(7r), with 7t(w) = {q, u), and for 
all of the r G N successor nodes of w: 

{p^ I n{w.k) = (p,u.O) for 0 < fc < r} U 

{p^ I TT{w.k) = {p,uA) for 0 < /c < r} ^ S{q,t{u)) . 

7T is accepting if for every leaf w in tt, where tt{w) = {p,u.k), u is leaf in t and 
p G F. The tree language accepted by A is L{A) = {t G A"^* | A accepts t}. If 
there exists an accepting run of A' = (A, Q, q, F, S) for q G Q on t, then we say 
that A accepts t from q. We use the same terminology for AWAs. 

It is straightforward to construct an alternating automaton from a nonde- 
terministic automaton of the same size. Conversely, given an AWA one can con- 
struct an equivalent NWA with at most exponentially more states [2,3,17]. The 
states of the nondeterministic automaton are the interpretations of the Boolean 
formulae of the alternating automaton’s transition function. This construction 
can be generalized to tree automata. Hence alternation does not increase the 
expressiveness of word and tree automata but, as we will see, it does enhance 
their ability to model problems. 

3 Linearly Inductive Boolean Functions 

3.1 Definition of LIFs 

We now define linearly inductive Boolean functions. Our definition differs slightly 
from [6,7,8], however they are equivalent (see §5). 



Syntax Let the two sets V = {ui, . . . , Vr} and F = {/i, . . . , f^} be fixed for the 
remainder of this paper. 
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A LIF formula (over V and F) is a pair {a, (3), with a G B{V) and (3 G 
B{V\iiF). The formulae a and (3 formalize the base and step case of a recursive 
definition. A LIF system (over V and F) is a pair (F, rj) where F is a set of LIF 
formulae over V and F and 77 : F — >■ F. That is, rj assigns each / G F a LIF 
formula {a, (3) G F. We will write (af,Pf) for rj{f) = {a, (3) and omit V and F 
when they are clear from the context. 

Semantics Let 5 be a LIF system. An evaluation of 5 on ic = 61 . . . G (B'’) + 
is a word c?i . . . G (B®)+ such that for bi = {a\, . . . , a(.), di = {c\, . . . ,cl), and 
I < A: < s: 

Cfc = 1 iff {vi I aj = I, for I < 1 < r} h , 

and for all t, I < t < n, 

Cfc = 1 iff {vi I O; = 1, for 1 < / < r} U 

{fi I = 1 , for 1 < 1 < 4 h Pu ■ 

An easy induction over the length of w shows: 

Lemma 1. For S a LIF system and w G (B’’)+, the evaluation of S on w is 
uniquely defined. 

Hence fk & F together with S determine a function f^ : (B’’)+ — >■ B. Namely, 
for w G (B’’)+, /f (w) = We call /f the LIF of S and /fc. When S is clear 
from the context, we omit it. 



Examples We present three simple examples. First, for V = {a;} and F = 
{serial jparity}, the following LIF system 5i formalizes the family of linear parity 
circuits given in the introduction. 



^serial-parity — ^ 



serial-parity 3^ © SGT%(xl-P(XT%ty 



In particular, serial-parity^^ applied to bi ... bn £ B+ equals parity^{bi , . . . , 6 „). 

The second LIF system ^2 over V = {a,b,cin} and F = {sum, carry} for- 
malizes a family of ripple-carry adders. 



cusum = (a © 6) © cin (3sum = (a © &) © carry 

oicarry = (o A &) V ((o V 6 ) A ciu) (3 carry = (a A b) V ((a V b) A Carry) 



Here sumP^ [respectively carry^p represents the adder’s nth output bit [respec- 
tively carry bit]. 

The third example shows how to describe a sequential circuit by a LIF system. 
The LIF system Sz over V = {e} and F = {Fi, I 2 , L 3 } describes a 3-bit counter 
with an enable bit. 



avi = 0 / 3 yi = (e a “’Ll) V (~ie A Yi) 

ay -2 = 0 /3y 2 = (e a (Yi © Y2)) V (-^e A Y2) 

av3 =0 /3 f3 = (e A ((V, A V2) © Y3)) V (-e A Y3). 
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with w G B+, is the value of the ith output bit at time |w| of the 3-bit 
counter, where w encodes the enable input signals. 



3.2 Equivalence of LIE Systems and AWAs 

A function g : (B'’)+ — >■ B is LIF-representable if there exists a LIF system S 
and a / G F, where g{w) = f^{w) for all w G (B’’)+. A language L C (B’’) + 
is LIF-representable if its characteristic function, g{w) = 1 iff m G F, is LIF- 
representable. Gupta and Fisher have shown in [6,9] that any LIF-representable 
language is regular. They prove that their data-structure for representing a LIF 
system corresponds to a minimal deterministic automaton that accepts the lan- 
guage {w^ I f^{w) = 1, for w G (B’') + }. 

We present here a simpler proof of regularity by showing that LIF systems 
directly correspond to AWAs. We also prove a weakened form of the converse: 
almost all regular languages are LIF-representable. The weakening though is 
trivial and concerns the empty word, and if we consider languages without the 
empty word we have an equivalence.^ Hence, for the remainder of this secction, 
we consider only automata (languages) that do not accept (include) the empty 
word A. 

For technical reasons we will work with LIF systems in a kind of negation 
normal form. A Boolean formula j3 G B{X) is positive in Y C A if negations 
occur only directly in front of the Boolean variables v G X\Y and, furthermore, 
the only connectives allowed are A and V. A LIF system S is in normal form 
if (3f is positive in F, for each f G F. 

Lemma 2. Let S be a LIF system over V and F. Then there is a LIF system 
S' over V and F' = F l±l {/ | / G F} in normal form where, for all f G F and 

w G (B’')+, f^ (w) = f^{w) and (w) = 1 iff f^{w) = 0, 

Proof. Without loss of generality, we assume that for (3 G B{X) only the connec- 
tives -1, V, and A occur. The other connectives can be eliminated as standard, 
which may lead to exponentially larger formulae. By nnf(/3) we denote the ne- 
gation normal form of (3 G B{X). 

By using the same idea as [3] , it is easy to construct a LIF system S' by in- 
troducing for each / G F a new variable / that “simulates” -i/. Let S = (F,/;). 
For f G F, with r]{f) = {a, (3), the mapping g' of the LIF system S' is defined 
by g'{f) = (a, 7) and g'{f) = (-■a,7) where 7 and 7 are obtained from nnf(/3), 
respectively nnf(-i/3), by replacing the sub-formulae ~<fi by /j. □ 

We now prove that LIF-representable languages and (A-free) regular langu- 
ages coincide. 

^ We can easily redefine LIFs to define functions over (B”) . However, following Gupta 
and Fisher we avoid this as the degenerate base case (0 length input) is ill-suited 
for modeling parametric circuits. Ignoring the empty word is immaterial for our 
complexity and algorithmic analysis. 




176 A. Ayari, D. Basin, and F. Klaedtke 



Theorem 1. LIF systems are equivalent to AWAs. In particular: 

i) Given an AWA A = (B'^,Q,gQ,F,S), there is a LIF system S over V = 
{vi, . . . , Vr} and Q in normal form such that for all w G (B’’)+ and q € Q, 

q'^{w) = 1 iff A accepts iv^ from q . 

ii) Given a LIF system S in normal form over V and F, there exists an AWA 
A with states F l±l {qtase, Qstep} such that for all w G (B’’)+ and f € F, 

A accepts w from f iff f^{w^) = 1 . 

Proof (i) We encode each & G B*" by a formula 7^ G B{V). For example, 
(0, 1, 1, 0) G B^ is encoded as the Boolean formula 7(0, 1,1,0) = ~'Ui Au2 A W3 A-iU4. 
The LIF formula for g in 5 is 

“9 = V ^ = V ^ 

beB'- beB’’ 

with B{q,b) = 1 iff -F |= 6{q,b)- Here, the Boolean formula Pq simulates the 
transition from the state g on a non-final letter of the input word. The final 
state set F is simulated by the Boolean formula a,, i.e., F ^ S{q, b) iff {vi \ bi = 
1} H ^q- 

We prove (i) by induction over the length of w G (B'’)+. If licl = I, then the 
equivalence follows from the definition of Uq, for any q G Q. Assume (i) is true 
for the word w, i.e., for each qk G Q, A accepts from qk iff qf{w) = 1. Let 
u.d be an evaluation of 5 on w with d = (ci, . . . , C|q|). It holds that qf{w) = 1 
iff Cfc = 1. We prove (i) for w.b with b = (ai, . . . , a^). As defined, for each q G Q 
we have q'^{w.b) = 1 iff 

{vi I a; = 1, for 1 < Z < r} U {® | c; = 1, for 1 < / < |Q|} h V (lb ^ 

b GB’’ 

By the induction hypothesis, we obtain {qi \ A accepts from qi, for 1 < ? < 
I (51} \= 6{qk,b)- From this we can easily construct an accepting run of A from q 
on (w.b)^. The other direction holds by definition of an accepting run. 

(ii) For an arbitrary g G F, let ^ = (B’’, F l±l {qtase, qstep}, 9, {qbase}, d) with 
5{qbase, b) = 0, S{qstep, b) = qbase V qstep, and for f G F 



S{f,{bi,...,br)) = (qstep /\Pf[bl/vi,...,br/Vr]) V 



Qbase 

0 



if {vi \bi = 1} \= af 
otherwise 



Intuitively when A is in state f G F and reads {b\, . . . ,br) G B*" it guesses 
if the base case is reached. When this is the case, the next state is qbase iff 
{vi \bi = 1} \= Oif. Otherwise, if the base case is not reached, the AWA proceeds 
according to the step case given by the Boolean formula /?/ of the LIF system. 
The equivalence is proved in a similar way to (i). □ 
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Note that if a LIF formula only uses the connectives A and V, then, 
following the proof of Lemma 2, a normal form can be obtained in polynomial 
time. Moreover, if V is fixed the size, the AWA A of Theorem l(ii) can be 
constructed in polynomial time, since the size of the alphabet B’’ is a constant. 
However, if we allow V to vary, then the size of the AWA constructed can be 
exponentially larger than the size of the LIF system, i.e. \V\ + +X)/eP’(l'^/l + 

1/3/1), since the input alphabet of A is of size 

3.3 Deciding LIF Equality 

Given LIF systems S over V and F, and T over V and G, and function symbols 
fk & F and gi G G, the equality problem for LIFs is to decide whether (w) = 
gj (w), for all w G (B’’)+. We first show that this problem is PSPACE-complete 
and afterwards show how, using HDDs, the construction in Theorem 1 provides 
the basis for an efficient implementation. 

Theorem 2. The equality problem for LIFs is PSPACE-eomplete. 

Proof. We reduce the emptiness problem for AWAs, which is PSPACE-hard [12, 
17], to the equality problem for LIFs. Given an AWA A with initial state qo, 
by Theorem l(i) we can construct an equivalent LIF system S in polynomial 
time. Let the LIF system T be given by the formulae Og = 0 and Pg = 0. Then 
Qo = 9^ iff ^(“4) = 0- 

Theorem l(ii) cannot be used to show that the problem is in PSPAGE be- 
cause, as explained in the previous section, both the normal form and the size of 
the two constructed AWAs can be exponentially larger than the size of the LIF 
instances. Hence, we instead give a direct proof. The following Turing machine 
Ai accepts a problem instance in PSPAGE iff a word w = &i . . . 6„ G (B'’) + 
exists with f^{w) yf gJ (w). Let d\. . .dn G (BI'^I)+ be the evaluation of 5 on ic 
and d[ . . .d'^ G (BI'^I)+ be the evaluation of T on w. M guesses in the ith step 
bi G B*" and calculates di = (ci, . . . , ciiT-i) and d[ = (c^, . . . , cjg,|) of the evaluati- 
ons. If Cfc yf c[ then JA accepts the instance and otherwise Ai continues with the 
(i -I- l)th step. Note that for the Ah step only di-\ and and bi are required 
to calculate di and d'. Hence Ai runs in polynomial space, since in the Ah step 
the space \V\ is required to store bi and the space 2(|F| -|- |G|) to store di_i, di, 
and d'. Ai needs linear time in the size of the LIF formulae of S and T to 
calculate di and d' from bi, di-i and d'_^. Since PSPAGE is closed under nonde- 
terminism and complementation, the equality problem for LIFs is in PSPAGE. □ 

Although the machinery of alternating automata may appear a bit heavy, it 
leads to simple translations as there is a direct correspondence between function 
symbols in a LIF system and states in the corresponding AWA. This would not 
be possible using nondeterministic automata. Because the emptiness problem for 
NWAs is LOGSPAGE-complete and the equality problem for LIFs is PSPAGE- 
complete, a translation of a LIF system to a nondeterministic automata must, 
in general, lead to an exponential blow-up in the state space. 
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Input: AWA A = {E, Q, qo,F, 5) 

Output: returns true iff L(A) = 0 

Current := {{go}}; 

Processed := 0; 

while Current ^ 0 do begin 

if Currents P{F) ^ 0 then return false; 
else begin 

Processed := ProcessedU Current; 

Current — {T C Q|T [= A, ^ ^ Current, a £ E}\Processed; 

end; 

end; 

return true; 



Fig. 1. Decision procedure for the emptiness problem for AWAs. 



Implementation In the proof of Theorem 2 we did not use the mapping from 
LIFs to AWAs given by Theorem l(ii) due to the possible exponential blow- 
up when normalizing the LIF system, and the certain exponential blow-up in 
representing the AWA’s alphabet. We describe here how these blow-ups can 
sometimes be avoided by using BDDs. 

The reduction of LIF equality to the emptiness problem for AWAs is straight- 
forward. From the LIF systems S over V and F, and T over V and G we con- 
struct the LIF system S over V and {/}l±lFl±lG with the additional LIF formula 
oj = -'(a/ O Og) and /3j = ~'(/3/ O /3g). We then normalize S and use Theorem 
l(ii) to construct the AWA A with the initial state /. By construction, L(A) A 0 
iff f^{w) = 1 for some w € (B’’)+ iff A 9^ ■ 

To decide if an AWA A = (S,Q,qo,F,S) accepts the empty language, we 
construct “on-the-fly” the equivalent NWA B = (S,V{Q), {qo},V{F),S') with 

S'{T,a) = {T' CQ\r ^ /\S{q,a)}, 

g£T 

and search for a path from the initial state {go} of to a final state. We do 
this with a parallel breadth-first search in the state space of B as described in 
Figure 1. 

To analyze the complexity, observe that the while-loop is traversed maxi- 
mally 2l‘^l-times and the calculation in each iteration requires 0(2l'^l | HD-time. 
Hence the worst-case running time is 0(2^1'^! |^D- We need two vectors of the 
length 21*^1 to represent the sets Current and Processed. Hence the required space 
is the maximum of 0(2l'5l) and the size of the representation of the AWA A. 

It is possible to use BDDs in two places to sometimes achieve an exponential 
savings in space. First, the sets Current, Processed C P{Q) can be encoded as 
BDDs where a BDD represents the characteristic function of the set. Second, 
since the size of ^’s alphabet (B’’) is exponential in \V\, we use the same idea 
that Gupta and Fisher employed for their representation of LIFs: we need not 
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initial state | 




Fig. 2. Representation of an AWA. 



explicitly represent the exponentially large alphabet and we can use BDDs to re- 
present the transition function. For example, Figure 2 depicts the representation 
of the AWA A = (B, {go, <Zi, 92 }, <Zo, {< 72 }, <5) with the transition function 

<5(go,0) = gi A 92 5(go,l)=9i 

<^( 91 , 0 ) = 9i A 92 < 5 ( 91 , 1) = 9i V 92 

5(92,0) = ! 5(92,1)=9i- 

The solid [respectively dashed] lines correspond to the variable assignment 1 
[respectively 0]. For example, the state 90 has a pointer to a BDD, where the first 
node (labeled a) encodes the alphabet; the solid line from this node points to a 
BDD representing 5(9o, 1) = 9 i and the dashed line points to a BDD representing 

^(90,0) = 9i A 92- 

We have implemented the emptiness test for AWAs using the CUDD package 
[16] and have begun preliminary testing and comparison. For the examples given 
in §3.1, building AWAs for the descriptions given and testing them for emptiness 
or equivalence with alternative descriptions is very fast: it takes a fraction of a 
second and most of the time is spent with I/O. We can carry out more ambitious 
tests by scaling up the sequential 3-bit counter example, namely performing tests 
on an n-bit counter for different values of n. This example is also interesting 
as it demonstrates the worst-case performance of our decision procedure since 
exponential many states of the NWA must be constructed to decide if the AWA 
describes the empty language. 

Table 1 gives empirical results of the required space and time for the emp- 
tiness test for the resulting AWAs on a SUN Sparc Ultra with 250MHz. In the 
rightmost column are the running times on a SUN Sparc Ultra with 300MHz 
for building the canonical representation in Gupta and Fisher’s approach. For 
large values of n, our approach yields significantly better results, although in 
both cases the algorithms require exponential time and space. Note that some 
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n 


# BDD nodes of 
transition function 


peak of fl BDD nodes of 
Current! Processed 


AWA 
CPU time 


LIF 

CPU time 


2 


21 


10 / 11 


0.1s 


0.0s 


4 


59 


34 / 42 


0.1s 


0.0s 


8 


189 


318 / 717 


1.0s 


4.0s 


10 


269 


971 / 3063 


8.2s 


81.0s 


12 


371 


3438 / 13037 


71.7s 


15241.5s 



Table 1. Empirical results of the emptiness test for a n-bit counter 



care must be taken in comparing these results: we have not included the time 
taken in constructing the AWA from the LIF system (it is linear) as this was 
done by hand. Further, Gupta and Fisher build a canonical representation of the 
LIF and they have used the older BDD package from David Long. 

4 Exponentially Inductive Boolean Functions 

The structure of this section parallels that of §3. After defining EIFs, we show 
how their equality problem can be decided using tree automata. The decision 
procedure however is not as direct as it is for LIFs. One problem is that inputs 
to EIFs are words not trees. We solve this by labeling the interior nodes of trees 
with a dummy symbol. However, the main problem is that the words must be 
of length 2”. This restriction cannot be checked by tree automata and we solve 
this by deciding separately if a tree automaton accepts a complete tree. 

4.1 Definitions of EIFs 

Syntax An EIF formula (over V and F) is a pair (a,/3), with a G B(V) 
and (3 G B{F x {L,R}). We write [respectively /^] for the variable (f,L) 
[respectively (f,R)] in F x {L,R}. An EIF system (over V and F) is a pair 
(F,? 7 ), where F and rj are defined as for a LIF system. Similarly to LIF systems, 
we write {af,j3f) for rj{f) = {a, (3). 

Semantics Let S be an EIF system over V and F. An evaluation of S on 
a word w = bi . . . 62 " G (B’')+ is a complete binary B®-labeled tree r with 
front (r) = di . . . ^ 2 " G (B®)+ and for 1 < fc < s: 

i) For bi = {a \, . . . , a() and di = {c \, . . . , c*), with 1 < i < 2”, 

4 = 1 iff {vi \ a\ = 1, for 1 < I < r} \= . 

ii) For each inner node u G dom(T) with t(m.O) = (c^, . . . , c(j), t(m. 1) = 
(c" ) • ■ • > c"). and t(m) = (ci, . . . , c^): 

Cfe = 1 iff {//" I c[ = 1, for 1 < 1 < s} U 

{f^ I 4 = 1, for 1 < / < s} h % • 
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Let = {w G If* I \w\ = 2", for some n € N}. As with LIFs, the eva- 
luation r is uniquely defined; hence fk & F and S together define a function 
— >• B. Namely f^{w) = Ck, where r is the unique evaluation of S 
on w and r(A) = (ci, . . . ,Cs). EIF-representable is defined analogously to LIF- 
representable. 

For example, the tree implementation of the parameterized parity circuit 
from the introduction is described by the EIF system S over V = {x} and 
F = {tree-parity} with the EIF formula: 

^tree^parity — ^ Ptree^parity — tV€-€--P(lT%ty (B tV€ 6 -PClTtty . 

Here the value of the EIF tree-paritif applied to a word ru = 6i . . . 62- G B+ is 
the value of the function parity‘s applied to {hi, , &2»)- 

4.2 Equivalence of EIF Systems and ATAs 

Using ATAs we can characterize the EIF-representable functions. To interpret 
a word in the domain of an EIF as a tree, we identify a word h\ . . . 62'* G S* 
with the complete tree t G (Attl I#})"*"*, where front(t) = b\ . . . &2" and all inner 
nodes are labeled with the dummy symbol In the following, stands for 
Al±l{#}. We call a tree t G A^* a S -leaf -labeled tree when (i) if w is a leaf, then 
t{w) G E and (ii) if w is an inner node, then t{w) = ff. 

Normal forms for EIF systems can be defined and obtained as for LIF systems 
and the proof of Theorem I can, with minor modifications, be generalized to 
EIFs. 

Theorem 3. EIF systems are equivalent to ATAs if the input trees are restricted 
to complete, leaf-labeled trees. In particular: 

i) Let A = {W^,Q,qij, F, 6 ) he an ATA. There is a normal form EIF system 
S over V = {vi, . . . ,Vr} and Q, such that for all q € Q and any complete 

-leaf-labeled tree t G (B^)"^+, 

g'^(front(t)) = 1 iff A accepts t from q . 

ii) Let S be a normal form EIF system over V and F. There is an ATA A 
with the state set F l±l {qbase,qstep}, such that A accepts from f € F only 

-leaf-labeled trees, and for any complete VI -leaf-labeled tree t G (B^)^+, 

A accepts t from f iff (front (t)) = 1 . 

4.3 Deciding EIF Equality 

The equality problem for EIFs and the size of an instance are defined similarly 
to LIFs. We cannot generalize the decision procedure from §3.3 to EIFs since 
we are only interested in trees of a restricted form: complete leaf-labeled binary 
trees. Unfortunately completeness is not a regular property, i.e. one recognizable 
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by tree automata, and hence we cannot reduce the problem to an emptiness pro- 
blem. Instead, we reduce the problem to the complete-tree- containment problem 
(CTCP) for NTAs, which is to decide whether a given NTA accepts a complete 
tree. 

Theorem 4. The equality problem for EIFs is in EXPSPACE. 

Proof. Let S over V and F, and T over V and G be EIF systems, and let f G F 
and g G G he given. Let S be the EIF system over V and {/} l+l E l±l G with the 
additional EIF formula a j = “’(a/ ^ «g) and /3j= ~'(Pf ^ Pg). We normalize 

S, and by Theorem 3(ii) construct an ATA A with the initial state /, such that 
f^ 7 ^ iff -4 accepts a complete tree. A has 2|{/}l±lFl±lG| -1-2 states and the size 
of the alphabet is 2l^l -I- 1. From A we can construct an equivalent NTA B 
that has at most many states. Hence we have reduced the equality 

problem for EIFs to CTCP for NTAs. The required space for the reduction is 
0(2l^l22|^l+|G|). 

We now show that CTCP for NTAs is in PSPACE. For the NTA A = 
{E,Q,qo,F,6) we construct the AWA A' = ({I}, Q, < 7 o, E, <5') with <5'(g, 1) = 
Vaei; V(p p )e5(q a)(P^P')- ^ to prove that A accepts a complete tree of 

height h iff A' accepts a word of length h. From this follows that CTCP for NTAs 
is in PSPACE because the emptiness problem for AWAs is in PSPACE [12,17]. □ 

In [13], it is proved that the equality problem for EIFs and CTCP for ATAs 
(to decide if an ATA accepts a complete tree) are both EXPSPACE-hard. We 
omit the proof, which is quite technical, due to space limitations. 



5 Comparisons and Related Work 

Our work was motivated by that of Gupta and Fisher [6,7,8] and we begin by 
comparing our LIFs and EIFs with theirs, which we will call LIFg and EIFq. 

For each n > 1, a LIFg / is given by a Boolean function, called the n-instance 
of / and denoted by /", where /^ : B’’ — >■ B and /" : B’’+'* — >■ B for n > 1 (r is 
the number of n-instance inputs and s is the number of (n— l)-instance function 
inputs). Further it must hold that for all m,n > 1 the m-instance and the n- 
instance of / are equal, i.e. /™ = /". By means of the parity function we explain 
how the value of a LIFg is calculated. The n-instances of seriaLparity^ (using 
their notation) are: 

seriaLparity^ = b^, 

serial-parity"' = 6" © seriaLparity"~^ for n > 1. 

The value of the LIFg seriaLparity on the word 6i . . . 6„ G (B’’)+, written as 
seriaLparity{bi, . . . ,bn), is the value of the 1-instance serial-parity^ applied to 
bi for n = 1. For n > 1, it is the value of the n-instance serial-parity" applied 
to bn and serial-parity{b \, . . . , &„_i). 
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The definitions of a “LIF formula” and a “LIF system” correspond to the 
definition of a “LIFq”. Moreover, the way the “value” of a LIFq is calculated cor- 
responds to our definition of “evaluation”. Hence both formalisms are equivalent. 
However, the algorithms, data-structures, and complexity of our approaches are 
completely different! 

Gupta and Fisher formalize LIFqS using a data-structure based on HDDs 
where terminal nodes are not just the constants 0 and 1, but also pointers to 
other HDDs. They then prove that each LIF system has a canonical represen- 
tation that can be obtained in -I- (2^'^')^) time and space in the 

worst-case. In contrast, we have given a decision procedure (Theorem 2) that 
requires polynomial space, which is a doubly exponential improvement in space 
and an exponential improvement in time. Despite its worse space complexity, our 
algorithm based on BDD-represented AWAs may give better results in practice 
than our PSPACE decision procedure. This depends on whether the BDDs used 
require polynomial or exponential space. If the space required is polynomial, 
then the resulting AWA and its emptiness test require only polynomial space. 
In the exponential case, as there are only 2|F| -|- 2 states, the emptiness test 
requires time and space. This case also represents an exponential 

improvement over Gupta and Fisher’s results, both in time and space. 

An EIFq / has, like a LIFq, for each n > 0, a n-instance function /^”, where 
/^ : B” — >■ B and, for n > 0, is a Boolean combination of three EIFqS, e, g 
and h, i.e. : B^ — >■ B. Further it must hold that for all m,n > 0. 

The value of the EIFq / on the word b\ . . .b 2 ^ G (B”)+, written as /(6i, . . . , & 2 "), 
is the value of the 0-instance applied to 6i if n = 0. For n > 0, it is the value 
of the n-instance applied to the value of the EIFq e of the left half of the 
word, i.e. e(&i, . . . , -i)> to the values of the EIFqS g and h of the right 

half of the word, i.e. g{b 2 n -i+i, ■ • ■ , & 2 ") and -i-k, • • • , ^ 2 ~)- 

EIFqS are strictly less expressive than EIFs. Indeed, since not every Boolean 
function for the n-instance function of an EIFq (for n > 0) is allowed, even 
simple functions cannot be described by an EIFq, e.g., F : B^+ — >■ B with 
F{w) = 1 iff ic = 0000 or w = 1100 or w = 1011. The reason is similar to 
why deterministic top-down tree automata are weaker than nondeterministic 
top-down tree automata; the restrictions of the n-instance function of an EIFq 
stems from the data-structure proposed for EIFqS in [6,7] in order to have a 
canonical representation. On the other hand, it is easy to see that F is EIF- 
representable. 

Our results on the complexity of the equality problem for EIFs are, to our 
knowledge, the first such results given in the literature. Neither we nor Gupta 
and Fisher have implemented a decision procedure for the equality problem for 
EIFs or EIFqS. 

We have seen that LIFs and EIFs can be reduced to word and tree automata. 
Gupta and Fisher also give in [6,7] an extension of their data-structure for LIFqS 
and EIFqS that handles more than one induction parameter with the restriction 
that the induction parameters must be mutually independent. We conjecture 
that it may be possible to develop similar models in our setting based on 2- 
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dimensional word automata (as described in [5]) and their extension to trees. 
However, this remains as future work. 

There are also similarities between our work and the description of circuits by 
equations of Brzozowski and Leiss in [2] . A system of equations S has the form 
Xi = U Si (for 1 < i < n) where the Fi^a are Boolean functions in 

the variables Xi, and each Si is either {A} or 0. It is shown in [2] that a solution 
to S is unique and regular, i.e., if each Xi is interpreted with Li C S* and the 
Li satisfy the equations in S, then the Li are unique and regular. LIF systems 
offer advantages in describing parameterized circuits. For example, with LIFs 
one directly describes the “input ports” using the variables V. In contrast, a 
system of equations must use the alphabet B” and cannot “mix” input pins and 
the signals of the internal wiring (and the same holds for outputs). Furthermore, 
descriptions using LIFs cleanly separate the base and step cases of the circuit 
family, which is not the case with [2]. 

Finally, note that the use of BDDs to represent word and tree automata, 
without alternation, is explored in [10,14]. There, BDD-represented automata 
are used to provide a decision procedures for monadic second-order logics on 
words and trees. This decision procedure is implemented in the Mona system, 
and Mona can be used to reason about LIF systems [1]: a LIF system is described 
by a monadic second-order formula, which Mona translates into a deterministic 
word automaton. Although this has the advantage of using an existing decision 
procedure, the complexity can be considerably worse both in theory and in 
practice. For example, for a 12-bit counter Mona (version 1.3) needs more than 
an hour to build the automaton and the number of BDD nodes is an order of 
magnitude larger than what is needed for our emptiness test for AWAs. 

6 Conclusions 

We have shown that LIFs and EIFs can be understood and analyzed using stan- 
dard formalisms and results from automata theory. Not only is this conceptually 
attractive, but we also obtain better results for the decision problem for LIFs 
and the first complexity results for EIFs. The n-bit counter example in §3.3 in- 
dicates that our approach, at least in some cases, is faster in practice. However, 
an in depth experimental comparison of the procedures remains as future work. 
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Abstract. Any formal method or tool is almost certainly more often ap- 
plied in situations where the outcome is failure (a counterexample) rat- 
her than success (a correctness proof). We present a method for symbolic 
model checking that can lead to significant time and memory savings for 
model-checking runs that fail, while occurring only a small overhead for 
model-checking runs that succeed. Our method discovers an error as soon 
as it cannot be prevented, which can be long before it actually occurs; for 
example, the violation of an invariant may become unpreventable many 
transitions before the invariant is violated. 

The key observation is that “unpreventability” is a local property of 
a single module: an error is unpreventable in a module state if no en- 
vironment can prevent it. Therefore, unpreventability is inexpensive to 
compute for each module, yet can save much work in the state explo- 
ration of the global, compound system. Based on different degrees of 
information available about the environment, we define and implement 
several notions of “unpreventability,” including the standard notion of 
uncontrollability from discrete-event control. We present experimental 
results for two examples, a distributed database protocol and a wireless 
communication protocol. 



1 Introduction 

It has been argued repeatedly that the main benefit of formal methods is falsifi- 
cation, not verification; that formal analysis can only demonstrate the presence 
of errors, not their absence. The fundamental reason for this is, of course, that 
mathematics can be applied, inherently, only to an abstract formal model of a 
computing system, not to the actual artifact. Furthermore, even when a formal 
model is verified, the successful verification attempt is typically preceded by 
many iterations of unsuccessful verification attempts followed by model revisi- 
ons. Therefore, in practice, every formal method and tool is much more often 
applied in situations where the outcome is failure (a counterexample), rather 
than success (a correctness proof). 

Yet most optimizations in formal methods and tools are tuned towards suc- 
cess. For example, consider the use of BDDs and similar data structures in 
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model checking. Because of their canonicity, BDDs are often most effective in 
applications that involve equivalence checking between complex boolean func- 
tions. Successful model checking is such an application: when the set of reachable 
states is computed by iterating image computations, successful termination is 
detected by an equivalence check (between the newly explored and the previously 
explored states). By contrast, when model checking fails, a counterexample is de- 
tected before the image iteration terminates, and other data structures, perhaps 
noncanonical ones, may be more efficient [BCCZ99]. To point out a second ex- 
ample, much ink has been spent discussing whether “forward” or “backward” 
state exploration is preferable (see, e.g., [HKQ98]). If we expect to find a coun- 
terexample, then the answer seems clear but rarely practiced: the simultaneous, 
dove-tailed iteration of images and pre-images is likely to find the counterex- 
ample by looking at fewer states than either unidirectional method. Third, in 
compositional methods, the emphasis is almost invariably on how to decompose 
correctness proofs (see, e.g., [HQR98]), not on how to find counterexamples by 
looking at individual system components instead of their product. In this paper, 
we address this third issue. 

Consider a system with two processes. The first process waits on a binary 
input from the second process; if the input is 0, it proceeds correctly; if the 
input is 1, it proceeds for n transitions before entering an error state. Suppose 
the second process may indeed output 1. By global state exploration (forward 
or backward), n + 1 iterations are necessary to encounter the error and return 
a counterexample. This is despite the fact that things may go terribly wrong, 
without chance of recovery, already in the first transition. We propose to instead 
proceed in two steps. First, we compute on each individual process (i.e., typically 
on a small state space) the states that are controllable to satisfy the requirements. 
In our example, the initial state is controllable (because the environment may 
output 0 and avoid an error), but the state following a single 1 input is not 
(no environment can avoid the error). Second, on the global state space, we 
restrict search to the controllable states, and report an error as soon as they are 
left. In our example, the error is reported after a single image (or pre-image) 
computation on the global state space. (A counterexample can be produced 
from this and the precomputed controllability information of the first process.) 
Note that both steps are fully automatic. Moreover, the lower number of global 
iterations usually translates into lower memory requirements, because BDD size 
often grows with the number of iterations. Finally, when no counterexample is 
found, the overhead of our method is mostly in performing step 1, which does 
not involve the global state space and therefore is usually uncritical. 

We present several refinements of this basic idea, and demonstrate the effi- 
ciency of the method with two examples, a distributed database protocol and a 
wireless communication protocol. In the first example, there are two sites that 
can sell and buy back seats on the same airplane [BGM92] . The protocol aims at 
ensuring that no more seats are sold than the total available, while enabling the 
two sites to exchange unsold seats, in case one site wishes to sell more seats than 
initially allotted. The second example is from the Two-Chip Intercom (TCI) 
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project of the Berkeley Wireless Research Center [BWR]. The TCI network is 
a wireless local network which allows approximately 40 remotes, one for each 
user, to transmit voice with point-to-point and broadcast communication. The 
operation of the network is coordinated by a base station, which assigns chan- 
nels to the users through a TDMA scheme. In both examples, we first found 
errors that occurred in our initial formulation of the models, and then seeded 
bugs at random. Our methods succeeded in reducing the number of global image 
computation steps required for finding the errors, often reducing the maximum 
number of BDD nodes used in the verification process. The methods are parti- 
cularly effective when the BDDs representing the controllable states are small 
in comparison to the BDD representing the set of reachable states. 

To explain several fine points about our method, we need to be more formal. 
To study the controllability of a module P, we consider a game between P 
and its environment: the moves of P consist in choosing new values for the 
variables controlled by P; the moves of the environment of P consist in choosing 
new values for the input variables of P. A state s of P is controllable with 
respect to the invariant □(/? if the environment has a strategy that ensures that 
ip always holds. Hence, if a state s is not controllable, we know that P from 
s can reach a -K/?-state, regardless of how the environment behaves. The set 
Cp of controllable states of P can be computed iteratively, using the standard 
algorithm for solving safety games, which differs from backward reachability only 
in the definition of the pre-image operator. Symmetrically, we can compute the 
set Cq of controllable states of Q w.r.t. Up. Then, instead of checking that 
P II Q stays within the invariant Up, we check whether P || Q stays within the 
stronger invariant □(C'p A Cq). As soon as P || Q reaches a state s that violates 
a controllability predicate, say, C'p, by retracing the computation of Cp, taking 
into account also Q, we can construct a path of P || Q from s to a state t that 
violates the specification p. Together with a path from an initial state to s, this 
provides a counterexample to Up. While the error occurs only at t, we detect it 
already at s, as soon as it cannot be prevented. The method can be extended to 
arbitrary LTL requirements. 

The notion of controllability defined above is classical, but it is often not 
strong enough to enable the early detection of errors. To understand why, consi- 
der an invariant that relates a variable x in module P with a variable y in module 
Q, for example by requiring that x = y, and assume that y is an input variable 
to P. Consider a state s, in which module P is about to change the value of x 
without synchronizing this change with Q. Intuitively, it seems obvious that such 
a change can break the invariant, and that the state should not be considered 
controllable (how can Q possibly know that this is going to happen, and change 
the value of y correspondingly?). However, according to the classical definition 
of controllability, the state s is controllable: in fact, the environment has a move 
(changing the value of y correspondingly) to control P. This example indicates 
that in order to obtain stronger (and more effective) notions of controllability, 
we need to compute the set of controllable states by taking into account the real 
capabilities of the other modules composing the system. We introduce three such 
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stronger notions of controllability: constrained, lazy, and bounded controllabi- 
lity. Our experimental results demonstrate that there is a distinct advantage in 
using these stronger notions of controllability. 

Lazy controllability can be applied to systems in which all the modules are 
lazy, i.e., if the modules always have the option of leaving unchanged the values of 
the variables they control [AH99] . Thus, laziness models the assumption of speed 
independence, and is used heavily in the modeling of asynchronous systems. If 
the environment is lazy, then there is no way of preventing the environment from 
always choosing its “stutter” move. Hence, we can strengthen the definition of 
controllability by requiring that the stutter strategy of the environment, rather 
than an arbitrary strategy, must be able to control. In the above example, the 
state s of module P is clearly not lazily controllable, since a change of x cannot 
be controlled by leaving y unchanged. Constrained controllability is a notion of 
controllability that can be used also when the system is not lazy. Constrained 
controllability takes into account, in computing the sets of controllable states, 
which moves are possible for the environment. To compute the set of constrai- 
nedly controllable states of a module P, we construct a transition relation that 
constrains the moves of the environment. This is done by automatically abstrac- 
ting away from the transition relations of the other modules the variables that 
are not shared by P. We then define the controllable states by considering a 
game between P and a so constrained environment. Finally, bounded controllabi- 
lity is a notion that can again be applied to any system, and it generalizes both 
lazy and constrained controllability. It considers environments that have both a 
set of unavoidable moves (such as the lazy move for lazy systems), and possible 
moves (by considering constraints to the moves, similarly to constrained con- 
trollability) . We also introduce a technique called iterative strengthening, which 
can be used to strengthen any of these notions of controllability. In essence, it 
is based on the idea that a module, in order to control another module, cannot 
use a move that would cause it to leave its own set of controllable states. 

It is worth noting that the technqiues developed in this paper can also be used 
in an informal verification environment: after computing the uncontrollability 
states for each of the components, one can simulate the design and check if any 
of these uncontrollable states can be reached. This is similar to the techniques 
retrograde analysis [JSAA97], or target enlargement [YD98] in simulation. The 
main idea of retrograde analysis and target enlargement is that the set of states 
that violate the invariants are “enlarged” with their preimages, and hence the 
chances of hitting this enlarged set is increased. Our techniques not only add 
modularity in the computation of target enlargemen, they also allow one to 
detect the violation of liveness properties through simulation. 

The algorithmic control of reactive systems has been studied extensively 
before (see, e.g., [RW89,EJ91,Tho95]). However, the use of controllability in 
automatic verification is relatively new (see, e.g., [KV96,AHK97,AdAHM99]). 
The work closest to ours is [ASSSV94], where transition systems for components 
are minimized by taking into account if a state satisfies or violates a given CTL 
property under all environments. In [Dil88], autofailure captures the concept 
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that no environment can prevent failure and is used to compare the equivalence 
of asynchronous circuits. 

2 Preliminaries 

Given a set V of typed variables, a state s over V is an assignment for V that 
assigns to each x G V a value s|x]. We indicate with States (V) be the set of 
all states over V, and with P(V) the set of predicates over V. Furthermore, we 
denote by V' = {x' | x G V} the set obtained by priming each variable in V. 
Given a predicate H G P(V), we denote by H' G P(V') the predicate obtained 
by replacing in H every x G V with x' G V. A module P = {Cp,Sp, Ip,Tp) 
consists of the following components: 

1. A (finite) set Cp of controlled variables, each with finite domain, consisting 
of the variables whose values can be accessed and modified by P. 

2. A (finite) set Sp of external variables, each with finite domain, consisting of 
the variables whose values can be accessed, but not modified, by P. 

3. A transition predicate Tp G P{Cp U £p U Cp) . 

4. An initial predicate Ip G V{Cp). 

We denote by Vp = Cp £p the set of variables mentioned by the module. 
Given a state s over Vp, we write s ^ /p if Ip is satisfied under the variable 
interpretation specified by s. Given two states s, s' over Vp, we write (s, s') ^ Tp 
if predicate Tp is satisfied by the interpretation that assigns to x G Vp the 
value s|x], and to x' G Vp the value s'|x]. A module P is non-blocking if the 
predicate Ip is satisfiable, i.e., if the module has at least one initial state, and if 
the assertion VVp . 3C'p . Tp holds, so that every state has a successor. 

A trace of module P is a finite sequence of states sq, si, S2, ■ ■ ■ s„ G States (Vp), 
where n > 0 and (sfc, Sfe+i) ^ Tp for all 0 < A: < n; the trace is initial if sq H ^p- 
We denote by C{P) the set of initial traces of module P. For a module P, we 
consider specifications expressed by linear-time temporal logic (LTL) formulas 
whose atomic predicates are in T(Vp). As usual, given an LTL formula (p, we 
write P \= (p a \= ip for all a G C{P)- 

Two modules P and Q are composable if Cp flCg = 0; in this case, their 
parallel composition P || Q is defined as: P \\Q = (Cp U Cq, {£p U £q) \ {Cp U 
Cq),Ip a Iq ,Tp A Tq). Note that composition preserves non-blockingness. 

We assume that all predicates are represented in such a way that boolean 
operations and existential quantification of variables are efficiently computable. 
Likewise, we assume that satisfiability of all predicates can be checked efficiently. 
Binary decision diagrams (BDDs) provide a suitable representation [Bry86]. 

Controllability. We can view the interaction between a module P and its envi- 
ronment as a game. At each round of the game, the module P chooses the next 
values for controlled variables Cp, while the environment chooses the next values 
for the external variables £p. Given an LTL specification p, we say that a state s 
of P is controllable with respect to p if the environment can ensure that all traces 
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from s satisfy if. To formalize this definition, we use the notion of strategy. A mo- 
dule strategy tt for P is a mapping tt : States{Vp)^ !-->■ States{Cp) that maps each 
finite sequence sq) s\, . . . ,Sk of module states into a state 7t(so, si, . . . , Sk) such 
that (sfc, 7 t(so) sij • ■ • 5 Sk)) Tp. Similarly, an environment strategy rj for P is a 
mapping t] : States (Vp)~^ !-->■ States (£p) that maps each finite sequence of module 
states into a state specifying the next values of the external variables. Given two 
states Si and S2 over two disjoint sets of variables Vi and V2, we denote by si ixi S2 
the state over Vi U V2 that agrees with si and S2 over the common variables. 
With this notation, for all s G States (Vp) and all module strategies tt and envi- 
ronment strategies 77, we define Outcome {s,Tr,r]) G States{Vp)‘^ to be the trace 
So,Si,S2, . . . defined by Sq = s and by Sfc+i = 7t(so,Si, . . . , s^) M ?7(so,Si, . . . ,Sfe). 
Given an LTL formula ip over Vp, we say that a state s G States{Vp) is con- 
trollable with respect to ip iff there is an environment strategy rj such that, for 
every module strategy tt, we have Outcome{s, tt, rf) ^ p. We let Ctr{P, p) be the 
predicate over Vp defining the set of states of P controllable with respect to p. 

Roughly, a state of P is controllable w.r.t. p exactly when there is an envi- 
ronment E for P such that all paths from s in P || P satisfy p. Since in general 
E can contain variables not in P, to make the above statement precise we need 
to introduce the notion of extension of a state. Given a state s over V and a 
state t over U, with V C we say that t is an extension of s if s|a;] = t|x] for 
all X G V. Then, there is module E composable with P such that all paths from 
extensions of s in P || P satisfy (/? iff s G Ctr{P, p) [AdAHM99]. 

3 Early Detection of Invariant Violation 

Forward and backward state exploration. Given a module R and a predi- 
cate p over Vp, the problem of invariant verification consists in checking whether 
R\= Up. We can solve this problem using classic forward or backward state ex- 
ploration. Forward exploration starts with the set of initial states of R, and 
iterates a post-image computation, terminating when a state satisfying -<p has 
been reached, or when the set of reachable states of R has been computed. In the 
first case we conclude R ^ Up] in the second, R ^ Up. Backward exploration 
starts with the set -<p of states violating the invariant, and iterates a pre-image 
computation, terminating when a state satisfying Ip has been reached, or when 
the set of all states that can reach -^p has been computed. Again, in the first case 
we conclude R Up and in the second R |= Up. If the answer to the invariant 
verification question is negative, these algorithms can also construct a counterex- 
ample Sq, ... ,Sm of minimal length leading from sq ^ Ir to Sm H Eind such 
that for 0 < i < mwe have (s^, s^+i) ^ Tp. If our aim is to find counterexamples 
quickly, an algorithm that alternates forward and backward reachability is likely 
to explore fewer states than the two unidirectional algorithms. The algorithm 
alternates post-image computations starting from Ip with pre-image computati- 
ons starting from -^p, terminating as soon as the post and pre-images intersect, 
or as soon as a fixpoint is reached. We denote any of these three algorithms (or 
variations thereof) by InvCheck{R,p). We assume that InvCheck{R, p) returns 
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answer Yes or No, depending on whether R ^ rup or i? ^ rup, along with a 
counterexample in the latter case. 

Controllability and early error detection. Given n > 1 modules Pi,P 2 ,. . 
Pn and a predicate ip G ^(Ur=i^Ei)) the modular version of the invariant ve- 
rification problem consists in checking whether Pi\\ ■■■ \\ Pn \= Op. We can use 
the notion of controllability to try to detect a violation of the invariant p in fe- 
wer iterations of post or pre-image computation than the forward and backward 
exploration algorithms described above. The idea is to pre-compute the states 
of each module Pi, . . . , that are controllable w.r.t. Op. We can then detect a 
violation of the invariant as soon as we reach a state s that is not controllable 
for some of the modules, rather than waiting until we reach a state actually 
satisfying -<p. In fact, we know that from s there is a path leading to -<p in 
the global system: for this reason, if a state is not controllable for some of the 
modules, we say that the state is doomed. 

To implement this idea, let R = Pi || • • • || P„, and for 1 < i < n, let 
ahsi{p) = 3(Vfl \ VpJ . (/? be an approximation of p that involves only the 
variables of Pp, note that p -G ahsi{p). For each 1 < z < n, we can compute 
the set Ctr{Pi,Oahsi{p)) of controllable states of Pi w.r.t. Oahsi{p) using a 
classical algorithm for safety games. For a module P, the algorithm uses the 
uncontrollable predecessor operator UPrep : P(Vp) H> P(Vp), defined by 

UPrep(X) = \f£'p . 3C'p . {Tp A X') . 

The predicate UPrep(X) defines the set of states from which, regardless of the 
move of the environment, the module P can resolve its internal nondeterminism 
to make X true. Note that a quantifier switch is required to to compute the 
uncontrollable predecessors, as opposed to the computations of pre-images and 
post-images, where where only existential quantification is required. For a mo- 
dule P and an invariant Op, we can compute the set Ctr{P, Op) of controllable 
states of P with respect to Op by letting Uq = -'P, and for fc > 0, by letting 

Pfc = -(^VUPrep(Pfc_i), (1) 

until we have Uk = Uk-i, at which point we have Ctr(P, Op) = -<Uk. For k > 0 
the set Uk consists of the states from which the environment cannot prevent 
module P from reaching ->p in at most k steps. Note that for all 1 < z < n, the 
computation of Ctr{Pi,Oabsi{p)) is carried out on the state space of module 
Pi, rather than on the (larger) state space of the complete system. We can then 
solve the invariant checking problem Pi || • • • || Pn |= Op by executing 

n 

InvCheck(^Pi\\ ■■■ \\ Pn, ‘P ^ f\ Gtr{Pi,Oabsi{p))) . (2) 

i=0 

It is necessary to conjoin p to the set of controllable states in the above check, 
because for 1 < z < n, predicate absi{p) (and thus, possibly, Ctr{Pi, Oabsi{p))) 
may be weaker than p. If check (2) returns answer Yes, then we have immedia- 
tely that Pi II • • • II P„ 1= Op. If the check returns answer No, we can conclude 
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that Pi II • • • II Pn ^ U(p. In this latter case, the check (2) also returns a partial 
counterexample sq, si, . . . , Sm, with Sm Ctr{Pj,ULpj) for some 1 < j < n. 
If Sm 1= “'</?) this counterexample is also a counterexample to \3(p. Otherwise, 
to obtain a counterexample sq, . . . , Sm, Sm+i, ■ ■ ■ , Sm+r with Sm+r '•P, we pro- 
ceed as follows. Let Uo,Ui, . ■ ■ ,Uk be the predicates computed by Algorithm 1 
during the computation of Ctr{Pj,Ci(pj); note that Sm H ^k- For I > 0, given 
Sm+I-I, we pick Sm+I such that Sm+i h Uk-i and {sm+i-i, Sm+i) h A”=iFpi- 
The process terminates as soon as we reach an I such that Sm+i H ~“P- since the 
implication Uq — f ~'P holds, this will occur in at most k steps. 

4 Lazy and Constrained Controllability 

In the previous section, we have used the notion of controllability to compute 
sets of doomed states, from which we know that there is a path violating the 
invariant. In order to detect errors early, we should compute the largest possible 
sets of doomed states. To this end, we introduce two notions of controllability 
that can be stronger than the classical definition of the previous section. The 
first notion, lazy controllahility , can be applied to systems that are composed 
only of lazy modules, i.e. of modules that need not react to their inputs. Several 
communication protocols can be modeled as the composition of lazy modules. 
The second notion, constrained controllability, can be applied to any system. 

Lazy controllability. A module is lazy if it always has the option of leaving 
its controlled variables unchanged. Formally, a module P is lazy if we have 
(s, s) \= Tp for every state s over Vp. If all the modules composing the sy- 
stem are lazy, then we can re-examine the notion of controllability described 
in Section 3 to take into account this fact. Precisely, we defined a state to be 
controllable w.r.t. an LTL property tp if there is a strategy for the environment 
to ensure that the resulting trace satisfies p, regardless of the strategy used 
by the system. But if the environment is lazy, we must always account for the 
possibility that the environment plays according to its lazy strategy, in which 
the values of the external variables of the module never change. Hence, if all 
modules are lazy, there is a second condition that has to be satisfied for a state 
to be controllable: for every strategy of the module, the lazy environment stra- 
tegy should lead to a trace that satisfies p. It is easy to see, however, that this 
second condition for controllability subsumes the first. We can summarize these 
considerations with the following definition. For 1 < i < n, denote by 77 ^ the 
lazy environment strategy of module Pi, which leaves the values of the external 
variables of Pi always unchanged. We say that a state s G States {VpJ is lazily 
controllable with respect to a LTL formula if iff, for every module strategy tt, 
we have Outcome{s,n,ri^) ^ p. We let LCtr{P,p) be the predicate over Vp 
defining the set of states of P that are lazily controllable with respect to p. 

We can compute for the invariant Op the predicate LCtr{P, Dtp) by replacing 
the operator UPre in Algorithm 1 with the operator LUPre : P(Vp) 1 — >■ P{Vp), 
the lazily uncontrollable predecessor operator, defined by: 

LUPrep(A) = 3Cp . (Tp A X')[£p/£'p] . 
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where (Tp A X')[£p/£'p\ is obtained from Tp A X' by replacing each variable 
x' £ £p with X £ £p. Note that LUPrepAl computes a superset of UPrepAl, and 
therefore the set LCtr{P, U(p) of lazily controllable states is always a subset of 
the controllable states Ctr{P, 

Given n > 1 lazy modules Pi,P 2 , ■ ■ ■ ,Pn and a predicate (f G ’PdJlLi 
R= Pi \\ • • • II Pn, and for all 1 < t < n. We can check whether || • • • || P„ ^ 
Uip by executing InvCheck{R, LCtr{Pi, Uabsi{ip))). If this check returns 

answer No, we can construct a counterexample to n^p as in Section 3. 

Constrained controllability. Consider again n > 1 modules Pi, P 2 , ■ ■ ■ , Pn, 
together with a predicate p £ P(Ur=i '^p)- Section 3, we defined a state to 
be controllable if it can be controlled by an unconstrained environment, which 
can update the external variables of the module in an arbitrary way. However, 
in the system under consideration, the environment of a module Pi is Qi = 
Pill ••• II P,_i II P,+i II ■■■ II Pn, for 1 < i < n. This environment cannot update 
the external variables of Pi in an arbitrary way, but is constrained in doing so by 
the transition predicates of modules Pj, for 1 < j < n, j i. If we compute the 
controllability predicate with respect to the most general environment, instead 
of Qi, we are giving to the environment in charge of controlling Pi more freedom 
than it really has. To model this restriction, we can consider games in which the 
environment of Pi is constrained by a transition predicate over Vp-Ufp. that over- 
approximates the transition predicate of Qi. We rely on an over-approximation 
to avoid mentioning all the variables in Uj=i > since this would enlarge the 
state space on which the controllability predicate is computed. 

These considerations motivate the following definitions. Consider a module P 
together with a transition predicate PI over Vp \J£'p. An P[ -constrained strategy 
for the environment of P is a strategy rj : States{Vp)~^ i-A- States{£p) such 
that, for all So,Si,...,Sfc £ States(Vp)~^ , we have (sk,r/(so, Si, . . . , sQ) ^ P[. 
Given an LTL formula p over Vp, we say that a state s £ States (Vp) is P[- 
controllable if there is an i7-constrained environment strategy 77 such that, for 
every module strategy tt, we have Outcome(s , tt , if) |= p. We let CCtr(P, {(P[))p) 
be the predicate over Vp defining the set of P-controllable states of P w.r.t. 
p.^ For invariant properties, the predicate CCtr(P, {(P[))np) can be computed 
by replacing in Algorithm 1 the operator UPre with the operator CUPrep[i?] : 
P(Vp) P(Vp), defined by: 

CVPrep[H](X) = \/£'p . (H -£ 3C'p . (Tp A X')) . 

When P[ = true, CUPrep[P](A) = UPrep(A); for all other stronger predicates 
p[, the i?-uncontrollable predecessor operator CUPrep[i/](X) will be a superset 
of UPrep(A), and therefore the set CCtr(P, {(H))p) of P-controllable states will 
be a subset of the controllable states Ctr(P, Up). 

^ If Eh is a module composable with P having transition relation H, the predicate 
CCtr(P, {(H))p) defines exactly the same set of states as the ATL formula {{E))op 
interpreted over P || Eh [AHK97]. 
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Given a system R = P 1 HP 2 II ■ • ■ \\Pn and a predicate ip £ PiVn), for 1 < i < n 
we let 

where l/j = Vpj \ Vp. . We can then check whether R ^ nip by executing 
InvCheck{R, CCtr{Pi, {{Hi))n\absi{ip))). If this check returns answer No, 

we can construct a counterexample proceeding as in Section 3. 

5 Experimental Results 

We applied our methods for early error detection to two examples: a distributed 
database protocol and a wireless communication protocol. We implemented all 
algorithms on top of the model checker Mocha [AHM+98], which relies on the 
BDD package and image computation engine provided by VIS [BHSV+96]. 

Demarcation protocol. The demarcation protocol is a distributed protocol 
for maintaining numerical constraints between distributed copies of a database 
[BGM92]. We considered an instance of the protocol that manages two sites that 
sell and buy back seats on the same airplane; each site is modeled by a module. 
In order to minimize communication, each site maintains a demarcation variable 
indicating the maximum number of seats it can sell autonomously; if the site 
wishes to sell more seats than this limit, it enters a negotiation phase with the 
other site. The invariant states that the total number of seats sold is always less 
than the total available. 

In order to estimate the sensitivity of our methods to differences in modeling 
style, we wrote three models of the demarcation protocol; the models differ in 
minor details, such as the maximum number of seats that can be sold or bought 
in a single transaction, or the implementation of the communication channels. 
In all models, each of the two modules controls over 20 variables, and has 8-10 
external variables; the diameter of the set of reachable states is between 80 
and 120. We present the number of iterations required for finding errors in the 
three models using the various notions of controllability in Table 1. Some of the 
errors occurred in the formulation of the models, others were seeded at random. 

Two-chip intercom. The second example is from the Two-Chip Intercom 
(TGI) project of the Berkeley Wireless Research Genter [BWR]. TGI is a wi- 
reless local network which allows approximately 40 remotes to transmit voice 
with point-to-point and broadcast communication. The operation of the net- 
work is coordinated by a base station, which assigns channels to the remotes 
through a TDMA scheme. Each remote and base station will be implemented in 
a two-chip solution, one for the digital component and one for the analog. The 
TGI protocol involves four layers: the functional layer (UI), the transport layer, 
the medium access control (MAG) layer and the physical layer. The UI provi- 
des an interface between the user and the remote. The transport layer accepts 
service requests from the UI, defines the corresponding messages to be transmit- 
ted across the network, and transmits the messages in packets. The transport 
layer also accepts and interprets the incoming packets and sends the messages 
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(a) Model 1. (b) Model 2. (c) Model 3. 

Table 1. Number of iterations required in global state exploration to find errors in 
3 models of the demarcation protocol. The errors are el,...,e4. The columns are L 
(lazy controllability), C (constrained controllability), R (regular controllability), and 
G (traditional global state exploration). 



to the UI. The MAC layer implements the TDMA scheme. The protocol stack 
for a remote is shown in Figure 1(a). Each of these blocks are described by the 
designers in Esterel and modeled in Palis using Codesiqn Finite State Machines 
[BCG+97]. 

There are four main services available to a user: ConnReq, AddReq, RemReq 
and DiscReq. To enter the network, a remote sends a connection request, Conn- 
Req, together with the id of the remote, to the base station. The base station 
checks that the remote is not already registered, and that there is a free time-slot 
for the remote. It then registers the remote, and sends a connection grant back 
to the the remote. If a remote wishes to leave the network, it sends DiscReq to 
the base station, which unregisters the remote. If two or more remotes want to 
start a conference, one of them sends AddReq to the base station, together with 
the id’s of the remotes with which it wants to communicate. The base station 
checks that the remotes are all registered, and sends to each of these remotes an 
acknowledgment and a time-slot assignment for the conference. When a remote 
wishes to leave the conference, it sends a RemReq request to the base station, 
which reclaims the time slot allocated to the remote. 

We consider a TCI network involving one remote and one base station. The 
invariant states that if a remote believes that it is connected to the network, 
then the base station has this remote registered. This property involves the 
functional and transport layers. In our experiment, we model the network in 
reactive modules [AH99] The modules that model the functional and transport 
layers for both the remote and the base station are translated directly from the 
corresponding CFSM models; based on the protocol specification, we provide 
abstractions for the MAC layer and physical layer as well as the channel between 
the remote and the base station. Due to the semantics of CFSM, the modules are 
lazy, and therefore, lazy controllability applies. The final model has 83 variables. 
The number of iterations required to discover the various errors, some incurred 
during the modeling and some seeded in, are reported in Figure 1(b). 

Results on BDD sizes and discussion. In order to isolate the unpredictable 
effect of dynamic variable ordering on the BDD sizes, we conducted, for each 
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(b) Iterations 



(a) Protocol Stack. 

Fig. 1. The TCI protocol stack and the nnmber of iterations of global state exploration 
to discover the error. 



error, two sets of experiments. In the first set of experiments, we turned off dyna- 
mic variable ordering, but supplied good initial orders. In the second, dynamic 
variable ordering was turned on, and a random initial order was given. Since 
the maximum BDD size is often the limiting factor in formal verification, we 
give results based on the maximum number of BDD nodes encountered during 
verification process, taking into account the BDDs composing the controllability 
predicates, the reachability predicate, and the transition relation of the system 
under consideration. We only compare our results for the verification using lazy 
controllability and global state exploration, since these are the most significant 
comparisons. Due to space constraint, we give results for model 3 of the demar- 
cation protocol as well as the TCI protocol. 

Without dynamic variable ordering. For each error, we recorded the maximum 
number of BDD nodes allocated by the BDD manager encountered during verifi- 
cation process. The results given in Table 2(a) and 2(b) are the averages of four 
experiment runs, each with a different initial variable order. They show that 
often the computation of the controllability predicates helps reduce the total 
amount of required memory by about 10-20%. The reason for this savings can 
be attributed to the fact that fewer iterations in global state exploration avoids 
the possible BDD blow-up in subsequent post-image computation. 

With dynamic variable ordering. The analysis on BDD performance is more 
difficult if dynamic variable ordering is used. We present the results in Tables 2(c) 
and 2(d) which show the averages of nine experiment runs on the same models 
with dynamic variable ordering on. Dynamic variable ordering tries to minimize 
the total size of all the BDDs, taking into account the BDDs representing the 
controllability and the reachability predicates, as well as the BDDs encoding 
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(a) Demarcation Protocol (Off). 
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(c) Demarcation Protocol (On). 

(d) TCI (On). 

Table 2. Average maximum number of BDD nodes required for error detection during 
the controllability (Control) and reachability computation (Total) phases. Dynamic 
variable ordering was turned off in (a) and (b), and on in (c) and (d). The results are 
given for lazy controllability and global state exploration. All data are in thousands of 
BDD nodes, and the standard deviations are given in parenthesis. 



the transition relation of the system. Hence, if the HDDs for the controllability 
predicates are a sizeable fraction of the other HDDs, their presence slows down 
the reordering process, and hampers the ability of the reordering process to 
reduce the size of the BDD of the reachability predicate. Thus, while our methods 
consistently reduce the number of iterations required in global state exploration 
to discover the error, occasionally we do not achieve savings in terms of memory 
requirements. 

When the controllability predicates are small compared to the reachability 
predicate, they do not interfere with the variable ordering algorithm. This ob- 
servation suggests the following heuristics: one can alternate the iterations in 
the computation of the controllability and reachability predicates in the follo- 
wing manner. At each iteration, the iteration in the controllability predicate 
is computed only when its size is smaller than a threshold fraction (say, 50%) 
of the reachability predicate. Otherwise, reachability iterations are carried out. 
Another possible heuristics to reduce the size of the BDD representation of the 
the controllability predicates is to allow approximations: our algorithms remain 
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sound and complete as long as we use over-approximations of the controllability 
predicates. 



6 Bounded Controllability and Iterative Strengthening 



Bounded controllability. In lazy controllability, we know that there is a move 
of the environment that is always enabled (the move that leaves all external 
variables unchanged); therefore, that move must be able to control the module. 
In constrained controllability, we are given the set of possible environment mo- 
ves, and we require that one of those moves is able to control the module. We 
can combine these two notions in the definition of bounded controllability. In 
bounded controllability, unlike in usual games, the environment may have some 
degree of insuppressible internal nondeterminism. For each state, we are given a 
(nonempty) set A of possible environment moves, as in usual games. In addition, 
we are also given a (possibly empty) set B Q A of moves that the environment 
can take at its discretion, even if they are not the best moves to control the 
module. We say that a state is boundedly controllable if (a) there is a move in A 
that can control the state, and (b) all the moves in B can control the state. The 
name bounded controllability is derived from the fact that the sets B and A are 
the lower and upper bounds of the internal nondeterminism of the controller. 

Given a module P, we can specify the lower and upper bounds for the en- 
vironment nondeterminism using two predicates G P{Vp U S'p). We can 

then define the bounded uncontrollable predecessor operator BUPre[i/^ i/“] : 
V{Vp) ^ V{Vp) by 

BVPre[H‘ , H^]{X) = [\/S'p.{H^ -G 3C'p.{TpAX'))]\/[3S'p.{H^ A3C'p.{TpAX'))] . 



Note that the quantifiers are the duals of the ones in our informal definition, since 
this operator computes the uncontrollable states, rather than the controllable 
ones. Note also that in general we cannot eliminate the first disjunct, unless 
we know that 3Ep . holds at all s G States (P), as was the case for lazy 
controllability. By substituting this predecessor operator to UPre in Algorithm 1, 
given a predicate (p over Vp, we can compute the predicate BCtr[H’' , H^]{P, Up) 
defining the states of P that are boundedly controllable w.r.t. Up. Given a system 
R= Pi \\ ■ ■ ■ \\ Pn and a predicate p over Vp, we can use bounded controllability 
to compute a set of doomed states as follows. For each 1 < f < n, we let as usual 
absi{p) = 3(Vp \ Vp, ) . p, and we compute the lower and upper bounds by 









Hr = A 






where for 1 < j < n, the set = Vp^ \ Vp, consists of the variable of Pj not 
present in Pi. We can then check whether R ^ Up by executing InvCheck(R, pA 
Ar=i ^^^''"[^1, H)^]{Pi, Uabsi{p))). If this check fails, we can construct counter- 
examples by proceeding as in Section 3. 

Iterative strengthening. We can further strengthen the controllability pre- 
dicates by the process of iterative strengthening. This process is based on the 
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following observation. In the system i? = Pi || • • • || in order to control Pi, 
the environment of Pi must not only take transitions compatible with the tran- 
sition relation of the modules Pj, for j € {1, . . . ,n} \ {i}, but these modules 
must also stay in their own sets of controllable states. This suggests that when 
we compute the controllable states of Pi, we take into account the controllability 
predicates already computed for the other modules. For 1 < i < n, if is the 
controllability predicate of module Pi, we can compute the upper bound to the 
environment nondeterminism by 






where S = di, . . . , For all 1 < i < n, we can compute a sequence of increa- 
singly strong controllability predicates by letting <5° = T and, for fc > 0, by 
= PC'tr[P-, P“((5^)](Pi, □(/?). For all 1 < i < n and all A: > 0, predicate 
is at least as strong as . We can terminate the computation at any A: > 0 
(reaching a fixpoint is not needed), and we can verify R \= by executing 
InvCheck{R, ip A AT=i ^i)- ^ increases, so does the cost of computing these 

predicates. However, this increase may be offset by the faster detection of errors 
in the global state-exploration phase. 



Discussion. The early error detection techniques presented in the previous sec- 
tions for invariants can be straightforwardly extended to general linear temporal- 
logic properties. Given a system R = Pi \\ ■ ■ ■ || P„ and a general LTL formula 
ip over Vr, we first compute for each 1 < z < n the predicate Si, defining the 
controllable states of Pi with respect to ip. This computation requires the solu- 
tion of w-regular games [EJ91,Tho95]; in the solution, we can use the various 
notions of controllability developed in this paper, such as lazy, constrained, or 
bounded controllability. Then, we check whether R\= ip A □(A”=i as before, 
if a state that falsifies A for some 1 < z < n is entered, we can immediately 
conclude that R^ ip. For certain classes of properties, such as reachability pro- 
perties, it is convenient to perform this check in two steps, first checking that 
R ^ □(Ar=i (enabling early error detection) and then checking that R\= ip. 

Acknowledgements. We thank Andreas Kuehlmann for pointing out the connec- 
tion of this work with target enlargement. 
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Abstract. A discrete strategy improvement algorithm is given for con- 
strncting winning strategies in parity games, thereby providing also a 
new solution of the model-checking problem for the modal /r-calculus. 
Known strategy improvement algorithms, as proposed for stochastic ga- 
mes by Hoffman and Karp in 1966, and for discounted payoff games and 
parity games by Puri in 1995, work with real numbers and require sol- 
ving linear programming instances involving high precision arithmetic. 
In the present algorithm for parity games these difficulties are avoided 
by the use of discrete vertex valuations in which information about the 
relevance of vertices and certain distances is coded. An efficient imple- 
mentation is given for a strategy improvement step. Another advantage 
of the present approach is that it provides a better conceptual understan- 
ding and easier analysis of strategy improvement algorithms for parity 
games. However, so far it is not known whether the present algorithm 
works in polynomial time. The long standing problem whether parity 
games can be solved in polynomial time remains open. 



1 Introduction 

The study of the computational complexity of solving parity games has two main 
motivations. One is that the problem is polynomial time equivalent to the modal 
/t-calculus model checking [7,18], and hence better algorithms for parity games 
may lead to better model checkers, which is a major objective in computer aided 
verification. 

The other motivation is the intriguing status of the problem from the point of 
view of structural complexity theory. It is one of the few natural problems which 
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is in NP n co-NP [7] (and even in UP fl co-UP [9]), and is not known to have a 
polynomial time algorithm, despite substantial effort of the community (see [7,1, 
17,10] and references therein). Other notable examples of such problems include 
simple stochastic games [3,4], mean payoff games [5,21], and discounted payoff 
games [21]. There are polynomial time reductions of parity games to mean payoff 
games [15,9], mean payoff games to discounted payoff games [21], and discounted 
payoff games to simple stochastic games [21]. Parity games, as the simplest of 
them all, seem to be the most plausible candidate for trying to find a polynomial 
time algorithm. 

A strategy improvement algorithm has been proposed for solving stochastic 
games by Hoffman and Karp [8] in 1966. Puri in his PhD thesis [15] has adapted 
the algorithm for discounted payoff games. Puri also provided a polynomial time 
reduction of parity games to mean payoff games, and advocated the use of the 
algorithm for solving parity games, and hence for the modal /x-calculus model 
checking. 

In our opinion Puri’s strategy improvement algorithm for parity games has 
two drawbacks. 

— The algorithm uses high precision arithmetic, and needs to solve linear pro- 
gramming instances: both are typically costly operations. An implementa- 
tion (by the first author) of Puri’s algorithm, using a linear programming 
algorithm of Meggido [12], proved to be prohibitively slow. 

— Solving parity games is a discrete, graph theoretic problem, but the crux of 
the algorithm is manipulation of real numbers, and its analysis is crucially 
based on continuous methods, such as Banach’s fixed point theorem. 

The first one makes the algorithm inefficient in practice, the other one obscures 
understanding of the algorithm. 

Our discrete strategy improvement algorithm remedies both above-mentioned 
shortcomings of Puri’s algorithm, while preserving the overall structure of the 
generic strategy improvement algorithm. We introduce discrete values (such as 
tuples of vertices, sets of vertices and natural numbers denoting lengths of paths 
in the game graph) which are being manipulated by the algorithm, instead of 
their encodings into real numbers. (One can show a precise relationship between 
behaviour of Puri’s and our algorithms; we will treat this issue elsewhere.) 

The first advantage of our approach is that instead of solving linear pro- 
gramming instances involving high precision arithmetic, we only need to solve 
instances of a certain purely discrete problem. Moreover, we develop an algorithm 
exploiting the structure of instances occurring in this context, i.e., relevance of 
vertices and certain distance information. In this way we get an efficient imple- 
mentation of a strategy improvement algorithm with 0{nm) running time for 
one strategy improvement step, where n is the number of vertices, and m is the 
number of edges in the game graph. 

The other advantage is more subjective: we believe that it is easier to analyze 
the discrete data maintained by our algorithm, rather than its subtle encodings 
into real numbers involving infinite geometric series [15]. The classical continuous 
reasoning gives a relatively clean proof of correctness of the algorithm in a more 




204 



J. Voge and M. Jurdzinski 



general case of discounted payoff games [15], but we think that in the case of 
parity games it blurs an intuitive understanding of the underlying discrete struc- 
ture. However, the long standing open question whether a strategy improvement 
algorithm works in polynomial time [4] remains unanswered. Nevertheless, we 
hope that our discrete analysis of the algorithm may help either to find a proof of 
polynomial time termination, or to come up with a family of examples on which 
the algorithm requires exponential number of steps. Any of those results would 
mark a substantial progress in understanding the computational complexity of 
parity games. 

So far, for all families of examples we have considered the strategy improve- 
ment algorithm needs only linear number of strategy improvement steps. Not- 
ably, a linear number of strategy improvements suffices for several families of 
difficult examples for which other known algorithms need exponential time. 

In this extended abstract, some substantial proofs are omitted, in particular 
for Lemmas 3 and 5, as well as the detailed correctness proof of the algorithm 
of Section 4. For a more complete exposition see [19]. 



Acknowledgement 

We are indebted to Wolfgang Thomas for his invaluable advice, support, and 
encouragement . 



2 Preliminaries 

A parity game is an infinite two-person game played on a finite vertex-colored 
graph G = (Vb, Vi, if, c) where V = VoUVi is the set of vertices and E C Vq x 
Vi U Vi X Vq the set of edges with Vu € V 3v € V : uEv, and c : V — >■ {1, . . . , d} 
is a coloring of the vertices. The two players move, in alternation, a token along 
the graph’s edges; player 0 moves the token from vertices in Vq, player 1 from 
vertices in Vi. A play is an infinite vertex sequence vqV\V2 ■ ■ ■ arising in this way. 
The decision who wins refers to the coloring c of the game graph: if the largest 
color which occurs infinitely often in c(uo)c(ui)c(u 2 ) ... is even then player 0 
wins, otherwise player 1 wins. One says that player 0 (resp. player 1) has a 
winning strategy from v G V if starting a play in v, he can force a win for 
arbitrary choices of player 1 (resp. player 0). By the well-known determinacy of 
parity games, the vertex set V is divided into the two sets Wq,W\, called the 
winning sets, where Wi contains the vertices from which player i has a winning 
strategy. 

Moreover, it is known (see, e.g., [6,14]) that on Wi player i has a memoryless 
winning strategy, which prescribes the respective next move by a unique edge 
leaving each vertex in k). 

In the following we fix the basic notations for games and strategies. Let 
G = {Vq,Vi,E,c) be a game graph with the vertices V = Vq U Vi and the 
coloring c: V —>■ {l,...d}. Let Wq, Wi be the winning sets of G. 
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A strategy for player i {i = 0, 1) is a function p : Vi ^ Vi-i such that vEp{v) 
for all V G Pi. A strategy for player i is called a winning strategy on W C V 
if player i wins every play starting in W when using that strategy. A winning 
strategy for player t is a winning strategy on the winning set Wi. 

We set 

V+ = {v & V \ c(v) is even} and V- = {v & V \ c(v) is odd} 

and we call vertices in V+ positive and those in V- negative. 

We introduce two orderings of V, the relevance ordering < and the reward 
ordering The relevance order is a total order extending the pre-order given by 
the coloring, i.e., such that c(u) < c(v) implies u < v. (So higher colors indicate 
higher relevance.) 

By inf(7r) we denote the set of vertices occurring infinitely often in the play 
7T. Referring to < over V, we can reformulate the winning condition for player 0 
in a play tt as 

max<(inf(7r)) G V+. 

Another ordering, the reward order indicates the value of a vertex as seen 
from player 0. The lowest reward originates from the vertex in V- of highest 
relevance, the highest from the vertex in V+ of highest relevance. Formally, we 
define: 



u V {u < V Av £ V+) y {v <u Au £ VL) 

We extend the reward order ^ to an order on sets of vertices. If P, Q G 2^^ 
are distinct, the vertex v of highest relevance in the symmetric difference P/AQ 
decides whether P ^ Q or Q ^ P: li v £ V+ then the set containing v is taken 
as the higher one, if v £ V- then the set not containing v is taken as the higher 
one. Formally: 



P^Q ^ P^Q A max<(PAQ) G QAV- 

Note that we use the same symbol A for vertices and for sets of vertices. 

We also need a coarser version -<n, of A, using a reference vertex w and taking 
into account only P- and Q-vertices of relevance > w: 

P Q P r\ {x £ V\x > re} < QC\{x £ V\x > w}, 

and the corresponding equivalence relation: 

P Q P :<w Q A Q :<w P 

3 Game Graph Valuations 

In this section we present the terminology and main ideas underlying the appro- 
ximative construction of winning strategies. The plays considered in the sequel 
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are always induced by two given strategies cr and r for players 0 and 1 respec- 
tively. Any such pair cr, t determines from any vertex a play ending in a loop L. 
The first Subsection 3.1 introduces certain values called play profiles for such 
plays. A play profile combines three data: the most relevant vertex of the loop L, 
the vertices occurring in the play that are more relevant than the most relevant 
vertex in the loop, and the number of all vertices that are visited before the most 
relevant vertex of the loop. The order ^ above is extended to play profiles (so 
that ^-higher play profiles are more advantageous for player 0). An arbitrary 
assignment of play profile values to the vertices is called a valuation. 

Subsection 3.2 gives a certain condition when a valuation originates from a 
strategy pair (cr, r) as explained. Such valuations are called locally progressive. 

The algorithm will produce successively such locally progressive valuations. 
In each step, a valuation is constructed from a strategy a of player 0 and an ‘op- 
timal’ response strategy r of player 1. In Subsection 3.3 this notion of optimality 
is introduced. We show that a valuation which is optimal for both players 0 and 
1 represents the desired solution of our problem: a pair of winning strategies for 
the two players. 

The last Subsection 3.4 will explain the nature of an approximation step, 
from a given valuation ip to a, (next) ‘response valuation’ p' . One should imagine 
that player 0 picks edges to vertices with ^-maximal values given by p, and 
that player 1 responds as explained above, by a locally progressive valuation p' . 
The key lemma says that uniformly p' will majorize p (i.e., p(v) -< p'(y) for all 
vertices v G V). This will be the basis for the termination proof because equality 
p = p' will imply that p is already optimal for both players. 

3.1 Play Profiles 

Let G = (Vo, hi, A, c) be a game graph. Let II be the set of plays that can 
be played on this graph. We define the function w : II ^ V which computes 
the most relevant vertex that is visited infinitely often in a play tt, i.e., w(tt) = 
max<inf(7r). Furthermore, we define a function a : 7T — >■ 2^^ which computes 
the vertices that are visited before the first occurrence of w{t:): 

a(7r) = {u G V \ 3i G = u A\/k G No ■ k < i tt^ ^ w{tt) } . 

We are interested in three characteristic values of a play tt: 

1. The most relevant vertex that is visited infinitely often: = w{tt). 

2. The set of vertices Pt^ that are more relevant than Utt and visited before the 

first visit of Ut^: = a{Tr) fl {w S V^ | u > w(7t)}. 

3. The number of vertices visited before the first visit of Utt'. = |a(7r)|. 

We call such a triple Pt^, e^) a play profile. It induces an equivalence relation 
of plays on the given game graph. By definition each profile of a play induced 
by a pair of strategies belongs to the following set: 

V = {{u,P,e) G V X 2^ X {0, . . . ,\V\ - 1} \ yv G P : u < V A |P| < e}. 
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The set T> is divided into profiles of plays that are won by player 0, respectively 1: 

Vq = { (u, P, e) G P 1 16 G } and T>i = { (t6, P, e) G V\u G hP } • 

We define a linear ordering ^ on P. One play profile is greater than another 
with respect to this ordering if the plays it describes are ‘better’ for player 0, 
respectively smaller if they are better for player 1. Let (m, P, e), (v, Q, /) G P: 



(u,P,e) < (v,Q,f) 



u < V 

V {u = V A P ^ Q) 

V{u = vAP = QAv£V-Ae<f) 
\/{u = vAP = QAvGV+Ae>f) 



The idea behind the last two clauses is that in case u = v and P = Q it is more 
advantageous for player 0 to have a shorter distance to the most relevant vertex 
V iiv (case / < e), resp. a longer distance if w G (case / > e). 

For the subsequent lemmata we need a coarser ordering of play profiles: 



(u,P,e) (v,Q,f) u~<v y {u = v A P <yjQ) 



and a corresponding equivalence relation 

(m, P, e) {v, Q, f) u = V A P Q 



3.2 Valuations 

A valuation is a function cp : V ^ T> which labels each vertex with a play profile. 

We are interested in valuations where all play profiles p{v) (y G V) are 
induced by the same pair of strategies. An initial vertex v and two strategies 
cr for player 0 and r for player 1 determine a unique play For a pair of 

strategies (cr, r) we can define the valuation induced by (cr, r) to be the function 
p which maps v € V to the play profile of 7r„ cr,r- This valuation p assigns to 
each vertex the play profile of the play starting in v played with respect to a 
and r. To refer to the components of a play profile p{v) = {u, P, e) we write po, 
Pi and p 2 , where po{v) = u, pi{v) = P and P 2 {v) = e. We call u the most 
relevant vertex of play profile p{v) (or of v, if p is clear from the context). 

The play profiles in a valuation induced by a strategy pair are related as 
follows: Let a, t be strategies and p their corresponding valuation. Let x,y GV 
be two vertices with a{x) = y or t{x) = y, i.e., in a play induced by a and t a 
move proceeds from x to y. It follows immediately that po{x) = poiy). We can 
distinguish the following cases: 

1. Case X < po(x): Then pi(x) = pi(y) and P 2 (x) = P 2 (y) + 1- 

2. Case x = po(x): By definition of p we have: pi{x) = 0 and P 2 {x) = 0. 
Furthermore p\{y) = 0, because there are no vertices on the loop through x 
that are more relevant than x. 

3. Case x > po(x): Then pi(x) = {x}U pi{y) and 1 ^ 2 ( 2 ^) = V 2 {y) + 1- 
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These conditions define what we call the ip -progress relation If an edge 
leads from x to y then x <l(^ y will mean that the tp-value is correctly updated 
when passing from cc to y in a play. Formally we define for x,y G V, assuming 
(p{x) = {u,P,e), (p{y) = {v,Q,f): 

X <\^ y u = V A ( {x = uAP = Q = tti Ae = 0) 

V {x < u A P = Q Ae=/+1) 

\/{x>uAP = QLI {x} a e = / + 1)) 

The following proposition is straightforward. 

Proposition 1. Let (p he a valuation, and let v a{v) and v <\,^ t{v), for all 
V GV. Then (cr, r) induces <p. 

Note that a valuation tp may be induced by several pairs (cr, r) of strategies 
so that several plays starting in v with play profile p{v) may exist. We now 
characterize those valuations which are induced by some strategy pair (cr, r). 
A valuation p is called locally progressive if 

Wu G V 3v G V : uEv Au <i,pV. 

The next proposition follows immediately from definitions. 

Proposition 2. Let G = {Vq,V\,E,c) he a game graph. Let p he a valuation 
for G. Then p is a locally progressive valuation iff there exists a strategy a for 
player 0 and a strategy r for player 1 such that p is a valuation induced hy (cr, r) . 

We call a strategy p : Vi ^ (i G {0, 1}) compatihle with the valuation p if 
\/v G Vi : v <l,p p(v). From Proposition 2 it follows that for a locally progressive 
valuation p at least one strategy for each player is compatible with p. 

3.3 Optimal Valuations 

A valuation p is called optimal for player 0 if the (^-progress relation x<\ipy only 
applies to edges xEy where the value p(y) is ^-maximal among the if-successors 
of X (i.e., among the values p(z) with xEz). However, a weaker requirement is 
made if x is the most relevant vertex associated to x via p (i.e., po(x) = x). In 
this case we discard the distance component p 2 ', formally, we replace ^ by :<x. 
(Recall that (u, P, e) (v, Q, f) holds if m ^ v holds, or u = v and P Q, i-e., 
e and / are not taken into account.) Formally, p is called optimal for player 0 if 
for all a; G Vb and y G Vi with an edge xEy: 

X <\^y Vz G Vi : xEz ^ p(z) ^ p(y) V (pq(x) = x A p(z) :<x p(y)) 

In the last case, the vertex y succeeds x in the loop of a play given by a strategy 
pair (cr, r) which is compatible with p; of course, there may be several such y 
with x<\ipy. Similarly p is called optimal for player 1 if for all x gV\ and y G Vq 
with an edge xEy: 

X <i^y Vz G Vb : xEz ^ p(y) ^ p(z) V {po(x) = x A p(y) <x v(z)) 
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A valuation that is optimal for both players is called optimal valuation. 

It is useful to note the following fact: If ip is optimal for player 0, then (since 
:<x is coarser than x <]^p y implies ip{z) -<x p{y) for xEz. Similarly if p is 
optimal for player 1 then x <hp y implies (p{y) :<x p(z) for xEz. 

The optimal valuations are closely related to the desired solution of a game, 
namely a pair of winning strategies. Consider a valuation p which is optimal for 
player 0, and let Wi be the set of vertices u whose (^-value p(u) is in T>i (i.e., the 
most relevant vertex po{u) associated to u is in V-, signalling a win of player 1). 
Any strategy for player 1 compatible with p turns out to be a winning strategy 
for him on whatever strategy player 0 chooses independently of p. Applying 
this symmetrically for both players we shall obtain a pair of winning strategies. 

Lemma 1. Let G = {Vq,V\,E,c) he a game graph. Let i € {0,1} and p he 
a locally progressive valuation of G which is optimal for player i. Let W\-i = 
{v & V \ p{v) G Then the strategies for player 1 — i compatihle with p are 

winning strategies on Wi-i (against an arbitrary strategy of player i). 

Applying this lemma symmetricly for both players leads to the following. 

Theorem 1. Let G = (Vq, Vi, E, c) be a game graph. Let p be an optimal locally 
progressive valuation of G. Then all strategies compatible with p are winning 
strategies. 



3.4 Improving Valuations 

Given two locally progressive valuations p and p' , we call p' improved for player 0 
with respect to p if 

Vx G V/q G V^ : xEy A x <\tp y A Vz G Vi : xEz => p{z) < p{y). 

A locally progressive valuation p' , which is improved for player 0 with respect 
to p, can be constructed from a given locally progressive valuation p by extrac- 
ting a strategy ct : Vq — >■ V} for player 0 that chooses maximal if-successors with 
respect to p and constructing a locally progressive valuation p' such that cr is 
compatible with p'. 

Lemma 2. Let G = (Vq, Vi, E, c) be a game graph. Let p he a locally progressive 
valuation optimal for player 1 and p' a locally progressive valuation that is im- 
proved for player 0 with respect to p. Then for all v GV, we have p{v) < p'{v). 

4 The Algorithm 

In this section we give an algorithm for constructing an optimal locally progres- 
sive valuation for a given parity game graph. This will lead to winning strategies 
for the game. The algorithm is split into three functions: main(), valuation(), 
and suhvaluation() . 
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In the function main() a sequence of strategies for player 0 is generated and 
for each of these strategies a locally progressive valuation is computed by calling 
valuation(). This valuation is constructed such that the strategy of player 0 is 
compatible with it and that it is optimal for player 1. The first strategy for 
player 0 is chosen randomly. Subsequent strategies for player 0 are chosen such 
that they are optimal with respect to the previous valuation. The main loop 
terminates if the strategy chosen for player 0 is the same as in the previous 
iteration. Finally a strategy for player 1 is extracted from the last computed 
valuation. 



main( G): 

1. for each n G Vb {Select initial strategy for player 0.} 

2. select cr(v) G Vi with vEgo(v) 

3. repeat 

4. Gct = (Vb, Fi, F , c), where Vm, w G F; 

uE V (c(rt) = V A u € Vo) V (uEgv A u € Fl) 

5. ip = valuation{Ga) 

6. a — a (Store a under name cr } 

7. for each u G Vb (Optimize a locally according to y>} 

8. if ip{a{v)) -< max {d G D\3v G V : v Eg v A d = (p{v )} then 

9. select cr(w) G Fi 

with <p(a(v)) = max {d G D\3v G V : v Eg v A d = tp(v )} 

10. until a = a 

11. for each v G Vi 

12. select t{v) G Vq 

with (p{t{v)) = min {d G D\3v G V : v Eg v A d — <p(v )} 

13. IFo = {u G F I (fio(v) G F+} 

14. IFi = (u G F I (fio(v) G F } 

15. return IFo, IFi, CT, r 



Fig. 1. This function computes the winning sets and a winning strategy for each player 
from a given game graph. 



In the functions valuation() and subvaluation(), we use the functions reach{G, 
u), minimaLdistances{G , u), maximaLdistances{G , u). The functions work on the 
graph G and perform a backward search on the graph starting in vertex u. 

— The function reach(G,u) produces the set of all vertices from which the 
vertex u can be reached in G (done by a backward depth first search) . 

— The function minimaLdistances{G , u) computes a vector 5 : Vq ^ {0, • • • ) 
|Fg| — 1} where 6{v) is the length of the shortest path from u to m (done by 
a backward breadth first search starting from u). 

— The function maximaLdistances{G , u) yields avector i5 : Fg — >■ (0, . . . , |Fg| — 
1} where <5(u) is the length of the longest path from v to u that does not 
contain u as an intermediate vertex. This is done by a backward search 
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starting from u where a new vertex is visited if all its successors are visited 
before. This algorithm only works if every cycle in the graph contains u. 

The valuation (H J-function produces a locally progressive valuation for the 
graph H that is optimal for player 1. It does this by splitting the graph into 
a set of subgraphs for which subvaluation() can compute a locally progressive 
valuation of this kind. 

The function searches a loop and then the set of vertices R from which 
this loop can be reached. Computing the locally progressive valuation for the 
subgraph induced by R is done by suhvaluation(). The rest of the graph (i.e., 
the subgraph induced by Vh \ R) is recursively treated in the same way. 



valuation(H ): 

1. for each v G V do 

2. = T. 

3. for each w G V (ascending order with respect to do 

4. if = T then 

5. L= reach{H\ v v v w ,w) 

6. if Eh n {ui} X L 7 ^ 0 then 

7. R= reach{H,w) 

8. ip\ji = subvaluation(H\ji,w) 

9. Eh = Eh \ {Rx{V\ R)) 

10. return ip 



Fig. 2. This function computes for graph H a locally progressive valuation that is 
optimal for player 1. 



The algorithm valuation(H) is given in Fig. 2. It works as follows: 



1. ip is set ‘undefined’ for all v. 

2. In ascending reward order those vertices w are found which belong to a 
loop L consisting of solely of <-smaller vertices. Then, for a fixed w, the 
set of all vertices is determined from which this loop (and hence w) is 
reachable (excluding vertices which have been used in sets Rw for previously 
considered w'). The valuation is updated on the set Rw 

The new valuation is determined as follows. By backward depth-first se- 
arch from w the vertices v in R^ are scanned, and edges leading outside 
deleted, in order to prohibit entrance to R^ by a later search (from a 
different w'). 

In suhvaluation(H) the role of vertices v in w.r.t. Am is analyzed. We 
shall have <po{v) = w for them. One proceeds in decreasing relevance order: 

— If u happens to be w, a backward breadth first search is done and only 
the distance decreasing edges are kept. 
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— Otherwise one distinguishes whether u is positive or negative. If u is 
positive, one computes the set U of vertices from where w can be reached 
while avoiding u, and for those v from which a visit to u is unavoidable, 
add u to and delete edges from v to vertices in U. 

— If M is negative then let U be the set of vertices v, such that u can be 
visited on a path from v to w. We add u to for all v € U, and we 

remove edges leading from U \ {u} to V\U. 

The function subvaluation (H) is given in Fig. 3. It computes the paths to 
w that have minimal reward with respect to -<w and stores for each vertex the 
resulting path in iy9-value. Edges that belong to paths with costs that are not 
minimal are removed successively. 



subvaluation(K, w J: 

1. for each v G Vk do 

2. <fio(v) = w 

3. (pUw) = 0 

4. for each w G {w G Vk \ v > w} (descending order with respect to <) do 

5. if M G V+ then 

6. U = reach{K\v^ „ ,w) 

7. for each v G Vk \ U do 

8. (pi(u) = (^(n) U {m} _ 

9. Ek = Ek\{{Uu{u})x (V\U)) 

10. else 

11. U = reach{K\vf, w ,u) 

12. for each v G U do 

13. (pi(u) = (pi(n) U {m} 

14. Sk = \((t/\{w}) X (F\F)) 

15. if w G V+ then 

16. (^2 = maximaLdistances{K,w) 

17. else 

18. >P 2 ~ minimaLdistances{K,w) 

19. return ip 



Fig. 3. This function computes a locally progressive valuation for a subgraph K with 
most relevant vertex w. 



5 Time Complexity 

In the analysis of the running time of a strategy improvement algorithm there 
are two parameters of major interest: 

1. the time needed to perform a single strategy improvement step, 

2. the number of strategy improvement steps needed. 

We argue that our discrete strategy improvement algorithm for parity games 
achieves a satisfactory bound on the former parameter. Let n be the number of 
vertices, and let m be the number of edges in the game graph. 
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Proposition 3. A single strategy improvement step, i.e., lines 4^.-9. in Figure 1, 
is carried out in time 0{nm). 

A satisfactory analysis of the latter parameter is missing in our work. Despite 
the long history of strategy improvement algorithms for stochastic and payoff ga- 
mes [8,4,15] very little is known about the number of strategy improvement steps 
needed. The best upper bounds are exponential [4,15] but to our best knowledge 
no examples are known which require more than linear number of improvement 
steps. We believe that our purely discrete description of strategy improvement 
gives new insights into the behaviour of the algorithm in the special case of pa- 
rity games. The two long standing questions: whether there is a polynomial time 
algorithm for solving parity games [7], and, more concretely, whether a strategy 
improvement algorithm for parity games terminates in polynomial time [4,15], 
remain open. Below we discuss some disjoint observations we have come up with 
so far, and some questions which we believe are worth pursuing. 

We say that a vertex is switchable if the condition in line 8. of the algorithm 
in Figure 1 is satisfied; we say that a vertex is switched in line 9. of Figure 1. 
Note that switching an arbitrary non-empty subset of the set of switchable ver- 
tices in every strategy improvement step gives a correct strategy improvement 
algorithm for parity games. Therefore, one can view our algorithm as a gene- 
ric algorithm which can be instantiated to a fully deterministic algorithm by 
providing a policy for choosing the set of vertices to switch in every strategy im- 
provement step. Melekopoglou and Condon [13] exhibit families of examples on 
which several natural policies switching only one switchable vertex in every stra- 
tegy improvement step require an exponential number of strategy improvement 
steps. It is open whether there are families of examples of parity games on which 
there are policies requiring super-polynomial number of strategy improvement 
steps. On the other hand, for every parity game and every initial strategy, there 
exists a policy requiring only linear number of strategy improvement steps. 

Proposition 4. For every parity game and initial strategy, there is a policy for 
which the strategy improvement algorithm switches every vertex at most once, 
and therefore it terminates after at most n strategy improvement steps. 

This contrasts with an algorithm for solving parity games based on progress 
measures [10], for which there are families of examples on which every policy 
requires an exponential number of steps. 

Examples of Melekopoglou and Condon [13] are Markov decision processes, 
i.e., one-player simple stochastic games [3]. It is an open question whether the 
strategy improvement algorithm using the standard policy, i.e., switching all 
switchable vertices in every strategy improvement step, works in polynomial 
time for one-player simple stochastic games [13]. In contrast, our discrete stra- 
tegy improvement algorithm terminates in polynomial time for one-player parity 
games. 

Proposition 5. The discrete strategy improvement algorithm terminates after 
0{n^) strategy improvement steps for one-player parity games. 
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Most algorithms for solving parity games studied in literature have 0((n/d)‘^) 
or 0((n/(i)‘^/^) worst-case running time bounds (see [10] and references therein), 
where d is the number of different priorities assigned to vertices. The best upper 
bound we can give at the moment for the number of strategy improvement steps 
needed by our discrete strategy improvement algorithm is the trivial one, i.e., 
the number of different strategies for player 0, which can be 

Proposition 6. The discrete strategy improvement algorithm terminates after 
riuGio out-deg(u) many strategy improvement steps. 

There is, however, a variation of the strategy improvement algorithm for pa- 
rity games, for which the number of strategy improvement steps is bounded by 

0{{n/dr). 

Proposition 7. There is a strategy improvement algorithm for parity games 
which terminates after 0(^{n/d)‘^) improvement steps, and a single strategy im- 
provement step can be performed in time. 

Note that in every strategy improvement step the current valuation strictly 
improves in at least one vertex. We say that a strategy improvement step is sub- 
stantial if in the current valuation the first component of a profile of some vertex 
strictly improves. Observe that there can be at most O(n^) substantial strategy 
improvement steps. It follows that in search for superpolynomial examples one 
has to manufacture gadgets allowing long sequences of non-substantial strategy 
improvement steps. 

We have collected a little experimental evidence that in practice most impro- 
vement steps are non-substantial. There are few interesting scalable families of 
hard examples of parity games known in literature. Using an implementation of 
our discrete strategy improvement algorithm due to the first author [16] we have 
run some experiments on families of examples taken from [2] and from [10], and 
on a family of examples mentioned in [10] which make Zielonka’s version [20] of 
the McNaughton’s algorithm [11] work in exponential time. For all these families 
only linear number of strategy improvement steps were needed and, interestingly, 
the number of non-substantial strategy improvement steps was in all cases con- 
stant, i.e., not dependent of the size of the game graph. 
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Abstract. In this paper we address the problem of distributing mo- 
del checking of timed automata. We demonstrate through four real life 
examples that the combined processing and memory resources of multi- 
processor computers can be effectively utilized. The approach assumes a 
distributed memory model and is applied to both a network of worksta- 
tions and a symmetric multiprocessor machine. However, certain unex- 
pected phenomena have to be taken into account. We show how in the 
timed case the search order of the state space is crucial for the effectiven- 
ess and scalability of the exploration. An effective heuristic to counter 
the effect of the search order is provided. Some of the results open up 
for improvements in the single processor case. 



1 Introduction 

The technical challenge in model checking is in devising algorithms and data 
structures that allow one to handle large state spaces. Over the last two decades 
numerous approaches have been developed that address this problem: symbolic 
methods such as BDDs, methods that exploit symmetry, partial order reduction 
techniques, etc [4]. One obvious approach that has been applied successfully by 
a number of researchers is to parallelize (or distribute) the state space search [1, 
15]. Distributed reachability analysis and state-space generation has also been 
investigated in the related field of performance analysis in the context of sto- 
chastic Petri nets [3,8] (see the second paper for further references). Since the 
state-of-the-art in model checking and performance analysis is still progressing 
very fast, it does not make sense to develop parallel or distributed tools from 
scratch. Rather, the goal should be to view parallelization as an orthogonal 
feature, which can always be easily added when the appropriate hardware is 
available. 

* Research supported by Esprit Project 26270, Verification of Hybrid Systems (VHS). 
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To some extend this goal has been achieved in the work of [3,15,8], all with 
very similar solutions. Stern and Dill [15], for example, present a simple but 
elegant approach to parallelize the Murtp tool [5] using the message passing 
paradigm. In parallel Muri^, the state table, which stores all reached protocol 
states, is partitioned over the nodes of the parallel machine. Each node maintains 
a work queue of unexplored states. When a node generates a new state, the 
owning node for this state is calculated with a hash function and the state is 
sent to this node; this policy implements randomized load balancing. In the 
case of Murtp, the algorithm of Stern and Dill achieves close to linear speedup. 
We applied the approach of Stern and Dill to parallelize Uppaal[11], a model 
checker for networks of extended timed automata. We experimented with parallel 
Uppaal using four existing case studies: DACAPO [13], communication [7] and 
power-down [6] protocols used in B&O audio/video equipment, and a model of 
a buscoupler. 

In the case of timed automata, the state space is uncountably infinite, and 
therefore one is forced to work with symbolic states, which are finite representa- 
tions of possibly infinite sets of concrete states. A key problem we had to face in 
our work is that the number of symbolic states that has to be explored depends 
on the order in which the state exploration proceeds. In particular, the number 
of states tends to grow if state space exploration is parallelized. The main con- 
tribution of this paper consists an effective heuristic which takes care that the 
growth of the number of states remains within acceptable bounds. As a result 
we manage to obtain close to double linear speedups for the B&O protocols and 
the buscoupler. For the DACAPO example the speedup is not so good, probably 
because the state space is so small that only a few nodes are involved in the 
computation at a time. Some of the results open up for improvements in the 
single processor case. 

The rest of this paper is structured as follows: Section 2 reviews the notion 
of timed automata. Section 3 describes our approach to distributed timed model 
checking. Section 4 presents experimental results, and Section 5 summarizes 
some of the conclusions. 

2 Model Checking Timed Automata 

In this section we briefly review the notion of timed automata that underlies the 
Uppaal tool. For a more extensive introduction we refer to [2,10]. For reasons of 
simplicity and clarity in presentation we have chosen to only give the semantics 
and exploration algorithms for timed automata. The techniques described in this 
paper extend easily to networks of timed automata, even when extended with 
shared variables as is the case in Uppaal. 

Timed automata are finite automata extended with real-valued clocks. Fi- 
gure 1 depicts a simple two node timed automaton. As can be seen both the lo- 
cations and edges are labeled with constraints on the clocks. Given a set of clocks 
C, we use B{C) to stand for the set of formulas that are conjunctions of atomic 
constraints of the form a; ixi n and x — y (xi n for x,y € C, txiG {<, <, =, >, >} 




218 G. Behrmann, T. Hune, and F. Vaandrager 



X > 1 




x :=0 

Fig. 1. A simple two state timed automaton with a single clock x. 



and n being a natural number. Elements of B{C) are called clock constraints 
over C. P{C) denotes the power set of C. 

Definition 1. A timed automaton A over cloeks C is a tuple {L,Iq,E,I) where 
L is a finite set of loeations, Iq is the initial loeation, E C L x B{C) x V{C) x L 
is the set of edges, and I : L ^ B{C) assigns invariants to locations. In the ease 
of {I, g,r, I') G E, we write I I'. 

Formally, clock values are represented as functions called clock assignments 
from C to the non-negative reals R>o- We denote by R'" the set of clock assig- 
nments for C. The state space of an automaton A is L x R^. The semantics of 
a timed automaton A is defined as a transition system: 

— {I, u) -G {l,u + d) if u G I{1) and u + d G I{1) 

— {I, u) — >■ (/', u') if there exist g and r s.t. I l',u G g and u' = [r i-G- 0]m 

where for d G R, t6 -I- d maps each clock a: in C to the value u{x) + d, and 
[r I— >■ 0]u denotes the assignment for C which maps each clock in r to the value 0 
and agrees with u over C\r. In short, the first rule describes delay and the second 
edge transitions. It is easy to see that the state space is uncountable. However, it 
is a well-known fact that timed automata have a finite-state symbolic semantics 
[2] based on countable symbolic states of the form (l,D), where D G B(C): 

- (I, D) -G {I, norm(M, {D A I{1))^ A I{1))) 

- {I, D) -G {V, r{gADA I{1)) A /(/')) if I ^ I' ■ 

where Df = {u + d\ uGDAdG R>o} (the future operation), and r{D) = 
{[r I— >■ 0]t6 I u G E}. The function norm : N x B(C) -A B(C) normalizes the clock 
constraints with respect to the maximum constant M of the timed automaton. 
Normalizing the clock constraints guarantees a finite state space. We refer to [2, 
10] for an in-depth treatment of the subject. 

The state space exploration algorithm is shown in Fig. 2. Central to the 
algorithm are two data structures: the waiting list, which contains unexplored 
but reachable symbolic states, and the passed list, which contains all explored 
symbolic states. An important but in the literature often ignored optimization 
is to check for state coverage in both lists. Instead of only checking whether a 
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Passed := 0 
Waiting := {(1q,Do)} 

repeat 

get {I, D) from Waiting 
if D ^ D for all (I, D ) € Passed then 
add (Z, D) to Passed 

Succ := {{I ,D) : (l,D) ^ {I ,D) A D / 0} 
for all {I ,D ) £ Sugg do 
put (Z , D ) in Waiting 

od 
end if 

until Waiting = 0 

Fig. 2. Sequential symbolic state space exploration for timed automata. 



symbolic state is already included in the list, Uppaal searches for states in the 
list that either cover the new state or is covered by it. In the first case the new 
state is discarded and in the latter case it replaces the existing state covered by 
it. We will return to this matter in Section 3. 



3 Distributed Model Checking of Timed Automata 



The approach we have used for distributing the exploration algorithm is similar 
to the one presented in [3,15,8]. Each node executes the same algorithm (see 
Fig. 3) which is a variant of the sequential algorithm shown in Fig. 2. Since we 
assume a distributed memory model, all variables are local. Each node is assigned 
a part of the state space according to a distribution function mapping symbolic 
states to nodes. Whenever a new symbolic state is encountered it is sent to the 
node responsible for exploring and storing that particular state. Each time a 
state has been explored and its successors have been sent, all states waiting to 
be received are received and put into the waiting list. If there are no states in 
the waiting list, the node waits until a state arrives. Although all nodes run the 
same algorithm, each node knows its own id and one node is the master node. 
This node is responsible for calculating the initial state and sending it to the 
owning node, and for deciding when the verification has finished. The verification 
terminates when there are no more states waiting to be explored in the waiting 
lists and there are no messages in transit. When the master finds out that the 
verification is finished it sends a termination signal to all the nodes. 



3.1 Nondeterminism and Search Orders 

When exploring a state space using Uppaal, one can choose between breadth- 
first or depth-first search order corresponding to a queue or a stack implementa- 
tion of the waiting list, respectively. In a distributed search one must still choose 
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Passed := 0 
Waiting := 0 

repeat 

receive states and place them in Waiting 
get {l,D) from Waiting 
if _D g D for all {I, D ) G Passed then 
add {I, D) to Passed 

Succ := {{I ,D) ■. {l,D) ^ (I ,D) A D g 0} 
for all {I , D ) G SuCG do 
send {I , D ) to h{l ) 

od 
end if 

until not terminate 

Fig. 3. The distributed state space exploration algorithm. 



whether each node uses a queue or a stack; we will call this “distributed breadth- 
first” and “distributed depth-first” order, respectively. This only tells in what 
mode the single nodes run. In general the search order will be nondeterministic 
and may change from execution to execution. 

In a distributed breadth-first search the states are explored in order of arrival 
at each node. However, the order in which states arrive at a node (enter the 
waiting list) will differ between executions. Some reasons for this are varying 
communication delays, and different workloads on the nodes. This means that 
in general states will not be searched in breadth-first order. 

The main difference between a depth-first search and a distributed depth- 
first search is that in the single processor case only one path is explored at a time 
while in the distributed case more paths are explored at the same time. This is 
because all successors of a state are generated and sent to their owning nodes, 
where the search is continued in parallel. When the waiting list is implemented 
as a stack small changes of the order in which states arrive may significantly 
change the search order. Assume two states a and [3 arrive at a node while it 
is exploring a state with a arriving last (so a will be on the top of the stack). 
The successors of a are generated and sent to their owning nodes. One or more 
of these may go to the node itself which means that they are explored before (3 
(because states are received before a new state is popped from the waiting list), 
and the same for their successors and so on. It may thus occur that (3 has to 
wait a long time before it is explored even though it has arrived at almost the 
same time as a. Hence small changes in the order of arrival of states may change 
the search order drastically. 

3.2 Why the Search Order Matters 

In a distributed state space search the number of states explored (and thereby 
the work done) may differ from run to run. This is because whether a state 
is explored or not depends on the states encountered before. As an example. 
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consider two states {I, D) and {I, D') with same location vector I but different 
time zones satisfying D C D' . If {I, D) arrives first and is explored before arrival 
of then {l,D') will also be explored since it is not covered by any state 

in the passed list (assuming that there are no other states covering it). Since the 
successors of {I, D') are very likely to have larger time zones than the successors 
of {l,D) these will also be explored later. However, if {l,D') arrives first and 
is explored before {l,D) arrives, then {l,D) will not be explored because it is 
covered by a state in the passed list. This also means that no successors of {I, D) 
will be generated or explored. 

Earlier experiments with the sequential version of Uppaal showed that 
breadth-first search is often much faster than depth-first search when genera- 
ting the complete state space. This comes from the fact that depth-first search 
order causes higher degree of fragmentation of the zones that breadth-first order, 
resulting in a higher number of symbolic states being generated. 

As noted above, the distributed algorithm neither realizes a strict breadth- 
first nor depth-first search. When using a queue on each node, the algorithm 
approximates breadth-first search. In fact, on a single node the search order 
will be breadth-first. As we increase the number of nodes, chances increase that 
the nondeterministic nature of the communication causes the ordering within 
the queue to be such that some states with a large depth (distance from the 
initial state) are explored before other states with a smaller depth. In cases 
where breadth-first is actually the optimal search order, increasing the number 
of nodes is bound to increase the number of symbolic states explored. 

Since it seems that breadth-first order in most cases is the optimal search 
order we propose a heuristic for making a distributed breadth-first order closer 
to breadth-first order. The heuristic keeps the states in each waiting list ordered 
by depth, for example by using a priority queue. This guarantees that the state 
in the waiting list with the smallest depth is explored first. In Section 4, we will 
demonstrate that this heuristic drastically reduces the rate at which the number 
of symbolic states increases when the number of nodes grows. In some cases it 
actually decreases the number of states explored. 

3.3 Distribution Functions and Locality 

On one hand, a good distribution function should guarantee a uniform work 
load for the nodes, on the other hand it should reduce communication between 
nodes. Since these objectives in most cases contradict each other, one has to find 
a suitable tradeoff. We therefore considered several distribution functions. 

As in [15], most of our results are based on using a hash function as the 
distribution function. However, to make the inclusion checks of the time zones 
in the waiting and the passed lists possible, states with the same location vector 
must be mapped to the same node. The hash value of a symbolic state is therefore 
only based on the location vector and not on the complete state. 

One possible hashing function is the one already implemented in Uppaal 
and used when states are stored in the passed list. It uniquely maps each state 
to an integer modulo the size of the hash table. Experiments have shown that 
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it distributes location vectors uniformly. Trying to increase locality of the dis- 
tribution function, it should be possible to use the fact that transitions only 
change a small part of the location vector and only some transitions change the 
integer variables. If we consider a state a and a successor (3, we can expect most 
locations and integer variables in f3 to be the same as in a. Section 4 reports on 
experiments where the distribution function only hashes on part of the location 
vector or only on the integer variables. 

Some experiments with model specific distribution functions were done, but 
it was extremely difficult to even approach the performance of the generic distri- 
bution functions. Finding effective model specific distribution functions requires 
much work and a thorough understanding of the given model. 

Within Uppaal the techniques described in [12] for reducing memory con- 
sumption by only storing loop entry points in the passed list are quite important 
for verifying large models. The idea is to keep a single state from every static 
loop (which are simple to compute). This guarantees termination while giving 
considerable reductions in memory consumption for some models. Uppaal im- 
plements two variations of this techniques. The most aggressive one is described 
in [12] which only stores loop entry points. While reducing memory consump- 
tion this technique may increase the number of states explored, since certain 
states are explored more than once. A less aggressive approach is to also store 
all non-committed states (in which no automaton is in a committed location) in 
the passed list. Experiments show that this is a good compromise between space 
and speed. 

We propose using this technique to increase locality in the exploration. Since 
non loop entry points are not stored on the passed list they might as well be 
explored by the node which computed the state in the first place instead of 
sending it to another node, thereby increasing locality. Consider, for example, a 
state a and its successor /3. If [3 is not a loop entry point and therefore is not 
going to be stored on the passed list, we may as well explore (3 on the same node 
as a. Section 4 reports on experiments with this technique. 

3.4 Generating Shortest Traces 

An important feature of a model checker is its ability to provide good debugging 
information in case a certain property is not satisfied. For a failed invariant 
property this is commonly a trace to the state violating the invariant. Providing 
a short trace increases the value of a trace. One of the features of Uppaal is 
that when the algorithm from Fig. 2 is used with a breadth-first search order, 
the trace to the error state is the shortest possible, since all states that can be 
reached with a shorter trace have been explored before. It would be nice to have 
this feature also in a distributed version of the tool. However, as described above, 
the order of a distributed state space search is non-deterministic, and this may 
lead to non-minimal traces. Fortunately, with little extra computational effort a 
shortest trace can be found regardless of the search order. The idea is to record 
for each symbolic state its “depth”, i.e., the length of the shortest trace leading 
to this state. When a violating state is found the algorithm does not stop, but 
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instead continues to search for violating states that can be reached with a shorter 
trace. We need to make sure that the inclusion checks performed on the waiting 
and passed lists do not discard potential violating states. When a new state 
(/, D) is added to the waiting or passed list, we normally compare it to every 
state {l,D') on the list, and if an inclusion exists we keep the larger of the two 
states. In order not to discard potential traces, we add the restriction that a 
state is only replaced/discarded if it does not have a smaller depth than the 
state it is compared to. The same idea is used for the decision whether or not 
to explore a state when looking it up in the passed list: we only decide not to 
explore a state if its clock constraints are included in the clock constraints of 
another state with the same location vector and at the same time does not have 
a smaller depth than the state it is included in. The corresponding line in the 
algorithm changes to: 

if D 2 D or depth(Z, D) < depth(Z, D ) for all {l,D) £ Passed then 

With these changes the algorithm in Fig. 3 can find shortest traces independently 
of the ordering used on the waiting list. As described above we have implemented 
a heuristic which approximates breadth-first search. In Section 4, we demonstrate 
that when using this heuristic the extra cost for finding the shortest trace is minor 
and we keep good speedups. 



4 Experimental Results 

For implementing communication between nodes we have used the Message Pas- 
sing Interface (MPI) [14]. This facilitates porting and running the program on 
different kinds of machines and architectures. We have conducted experiments 
on a Sun Enterprise 10000 with 24 333Mhz processors, which has a shared me- 
mory architecture, and on a Linux Beowulf cluster of 10 450Mhz Pentium III 
CPUs. 



4.1 Nondeterminism and Search Orders 

One of the first examples the distributed Uppaal was tried on, was a model of 
a batch plant [9] constructed to verify schedulability of a production process. 
The verification, which on one processor took several hours, surprisingly took 
less than five minutes on 16 nodes. This super linear speedup came as a surprise 
to us. Verifying schedulability in this model means searching for a state where 
all batches have been processed. For this particular model we had previously 
identified depth-first search as the fastest strategy on one node, and therefore we 
used a distributed depth-first search. In this particular model, the verification 
benefited from the nondeterministic search order. The distributed depth-first 
search did not find the same state as the verification on a single processor, and 
in fact the number of states searched was not the same in the two cases. It 
should be possible to achieve a similar effect with the sequential algorithm by 
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introducing randomness into the search order. First experiments with using a 
kind of random depth- first search have been promising. 

Because of this property of checking for a particular state (or a set of states), 
we have in the remaining experiments chosen to generate the complete state 
space of the given system using a distributed breadth-first search. Generating 
the complete state space reduces the impact of the nondeterministic search order 
because one cannot find a “lucky” path which finds the state searched for quickly. 
This makes the results from different runs comparable. 

4.2 Speedup Gained 

We have chosen to focus our experiments around four Uppaal examples: the 
start-up algorithm of the DACAPO [13] protocol which is quite small but had 
some interesting behavior as will be discussed later; a communication protocol 
used in B&O audio/video equipment (CP) [7]; a power-down protocol also used 
in B&O audio/video equipment (PD) [6]; and a model of a buscoupler (which 
thus far has not been published) . The reason not to look further at the model of 
the batch plant is that the state space was too big to be generated completely. 
All other known Uppaal examples were also tried, but these were so small that 
the complete state space can be generated in a matter of seconds using a few 
processors, and were therefore considered too small to be of interest. 

The examples were run on the Sun Enterprise on 1, 2, 3, 4, 5, 8, 11, 14, 17, 20 
and 23 nodes; and on the Beowulf on 1 to 10 nodes to the extend it was possible 
(only the DACAPO model could be run on a single node because of memory 
usage). Since the search order (and thereby the work done) is non-deterministic 
we repeated one experiment several times. The observed running times^ and 
number of states generated varied less than 3%. Running the experiments only 
once therefore seemed reasonable. 

When generating the complete state space for a number of examples using 
distributed breadth-first search a general pattern occurred. In most cases the 
number of states generated increased with the number of nodes, and in all cases 
the smallest number of states was generated using one node. It therefore seems 
that in most cases breadth-first is close to the optimal search order for generating 
the complete state space. In most cases the increase in the number of states was 
minor (less than 10%), but for a few examples the increase was substantial. In 
the DACAPO example the number of states more than doubled — from 45000 
states to more than 110000 states using 17 nodes (see Table 1). 

To counter this effect, we applied the heuristic described in Section 3.2 and 
used a priority queue to order the states waiting on each node such that the sta- 
tes with the shortest path to the initial state is searched first. Not only did this 
counter the increase in the number of states, it actually decreased the number 
of states generated in some cases. This shows that there is still room for im- 
provement with respect to the search order even when using a single processor. 
Table 1 and 2 show the effect of applying the heuristic to our examples. As can 

^ When talking about the running time we always consider the time of the slowest 
node. 
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Table 1. States generated with (Priority) and without (FIFO) use of heuristic on Sun 
Enterprise. 



# 


States 




DACAPO 


CP 


Buscoupler 


PD 




FIFO 


Priority 


FIFO 


Priority 


FIFO 


Priority 


FIFO 


Priority 


1 


45001 


44925 


3466548 


3010244 


6502 804 


6436543 


7992048 


7992098 


2 


45754 


44863 


5505161 


3027728 


8042882 


6199274 


8004165 


8003477 


3 


69141 


45267 


5472878 


3070491 


8064519 


6243785 


8001670 


7997859 


4 


62541 


45177 


5454067 


3086016 


8123748 


6171125 


8004717 


8004439 


5 


78008 


45667 


5583368 


3077890 


8651090 


6481067 


8002412 


7998607 


8 


77396 


46510 


5452888 


3113378 


8359647 


6185288 


8004898 


8004898 


11 


84598 


46318 


5642463 


3059169 


8968257 


6184329 


8004888 


8004892 


14 


108344 


49741 


5653134 


3102709 


8914300 


6278855 


8004888 


8004888 


17 


110634 


52247 


5270822 


3082967 


9049252 


6243571 


8001813 


7996979 


20 


98266 


47573 


5449055 


3111333 


9271401 


6251283 


8004881 


8004880 


23 


104945 


52457 


5535724 


3065916 


9146026 


6103629 


8004714 


8004651 



Table 2. States generated with (Priority) and without (FIFO) use of heuristic on 
Beowulf. 



# 


States 




DACAPO 


IR 


Buscoupler 


PD 




no order 


order 


noorder 


order 


noorder 


order 


noorder 


order 


1 


45858 


45748 


N/A 


N/A 


N/A 


N/A 


N/A 


N/A 


2 


48441 


46899 


N/A 


3028368 


N/A 


N/A 


N/A 


N/A 


3 


74882 


47671 


5605882 


3053837 


N/A 


N/A 


N/A 


N/A 


4 


62398 


47640 


5533159 


3058230 


15832617 


12794520 


9473496 


9409935 


5 


79899 


47678 


5454676 


3060070 


16637609 


13603603 


9432828 


9287527 


6 


92678 


49438 


5684749 


3133769 


20443824 


13896789 


9511548 


9482742 


7 


97065 


49739 


5702856 


3074131 


20329057 


13797531 


9513477 


9441041 


8 


97662 


50477 


5358514 


3106414 


22430748 


14442925 


9527173 


9488775 


9 


92642 


49284 


5449403 


3071827 


21086691 


14455201 


9535657 


9515920 


10 


92400 


48821 


5532205 


3060705 


20704595 


15507978 


9526732 


9500000 
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be seen from the tables, the heuristic performs well in three of the four exam- 
ples and in the PD example it has no effect. We also tried to use a distributed 
depth-first search order, and to ’reverse’ the heuristic to first explore the states 
with the longest path to the initial state during distributed depth-first search. In 
both cases the number of states generated was increased substantially. Therefore 
these search orders were discarded for the remaining experiments. 

An important question is of course how well the distribution of the search 
scales in terms of number of nodes. Tables 3 and 4 show the running times 
in seconds for the different examples on the Sun Enterprise and the Beowulf, 
respectively. When running on the Sun Enterprise we were able to generate the 
complete state space on a single node for all the examples. We can therefore 
calculate the speedup with respect to running on a single node. The speedups 
we have calculated are normalized with respect to the number of states explored, 
to clarify the effect of the distribution. The speedup for i nodes is calculated as 

time on one node/ states on one node 
time on i nodes/ states on i nodes 

where time on one node is the time for generating the complete state space using 
the distributed version running on one node, and time on i nodes is the time of 
the slowest node when running on i nodes. 



Table 3. Run time with (Priority) and without (FIFO) use of heuristic on Sun Enter- 
prise. 



# 


Run time 




DACAPO 


CP 


Buscoupler 


PD 




FIFO 


Priority 


FIFO 


Priority 


FIFO 


Priority 


FIFO 


Priority 


1 


8.6 


9.0 


804.0 


732.0 


2338.6 


2213.8 


3362.8 


3195.4 


2 


5.2 


5.0 


725.8 


351.6 


1506.5 


861.4 


1507.1 


1101.2 


3 


5.4 


3.7 


446.4 


238.6 


773.0 


559.4 


943.0 


649.8 


4 


3.9 


2.9 


317.9 


175.2 


596.4 


413.4 


713.4 


467.6 


5 


4.0 


2.5 


266.9 


142.0 


501.2 


342.5 


453.5 


373.1 


8 


2.8 


2.1 


152.8 


86.8 


283.0 


202.3 


231.6 


226.9 


11 


2.6 


1.9 


121.7 


65.3 


221.4 


148.0 


159.9 


161.4 


14 


2.7 


2.0 


95.5 


53.9 


172.2 


118.0 


127.3 


133.4 


17 


2.7 


2.1 


74.2 


43.1 


145.2 


97.7 


106.9 


102.4 


20 


2.4 


2.3 


66.5 


38.8 


127.6 


83.6 


93.0 


92.1 


23 


2.2 


2.4 


60.2 


34.3 


112.4 


72.7 


76.9 


79.6 



For the DACAPO example the speedup decreases from being linear already 
in the case of 5 nodes. However, it only takes 2.5 seconds to generate the com- 
plete states space using 5 nodes. Since the states space is small not all nodes 
can be kept busy and relatively much time is spent to start and close down the 
exploration. Therefore, a poor speedup was to be expected. For the CP example 
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Table 4. Run time with (Priority) and without (FIFO) use of heuristic on Beowulf. 



# 


Run time 




DACAPO 


CP 


Buscoupler 


PD 




FIFO 


Priority 


FIFO 


Priority 


FIFO 


Priority 


FIFO 


Priority 


1 


3.88 


4.15 


N/A 


N/A 


N/A 


N/A 


N/A 


N/A 


2 


3.20 


3.16 


N/A 


682.57 


N/A 


N/A 


N/A 


N/A 


3 


3.49 


2.37 


934.44 


349.86 


N/A 


N/A 


N/A 


N/A 


4 


2.88 


2.02 


540.19 


218.94 


1060.09 


799.69 


616.85 


541.89 


5 


2.71 


1.64 


390.02 


169.93 


836.09 


646.02 


413.45 


401.30 


6 


2.62 


1.52 


337.79 


144.50 


1796.23 


563.08 


453.20 


377.39 


7 


2.55 


2.47 


285.30 


124.69 


811.78 


476.69 


343.49 


315.69 


8 


2.51 


1.39 


200.50 


97.84 


782.28 


440.87 


283.07 


274.41 


9 


2.23 


1.38 


178.75 


87.38 


619.84 


394.91 


244.72 


242.16 


10 


2.00 


1.19 


173.07 


82.44 


536.74 


387.27 


214.03 


217.98 



the speedup is close to linear. However, for the buscoupler and the PD examples 
the speedup is super linear, which is surprising since the speedup has been nor- 
malized with respect to the total number of states. Figure 4 shows the graphs 
for the speedups of the CP and buscoupler examples. We are not sure about the 
reason for these super linear speedups. For the Sun Enterprise machine, acces- 
sing main memory is considered to be a bottleneck. When the number of nodes 
used in an exploration increases so does the amount of cache available (each 
node has 4Mb of cache). Since Uppaal spends much time looking up states 
in the passed and waiting lists, faster access to larger parts of these lists may 
increase the speed substantially. This conjecture is supported by the fact that 
the examples with the largest number of states (and therefore most accesses to 
the passed and waiting lists) gain the largest speedup. The same kind of super 
linear speedups were not encountered by Stern and Dill [15]. As mentioned in 
their paper, Mur(p has implemented a wide range of techniques for minimizing 
the state space. This means that, compared to Uppaal, Murt^ spends less time 
on looking up states and accessing memory, and therefore Mur(/? does not gain 
the same speedup from the larger cache. 

On the Beowulf it was in most cases not possible to generate the complete 
state space using only one processor. We have therefore chosen to present the 
amount of work done, where work for i nodes is defined as the time on i nodes 
times i divided by the number of states on i nodes, to normalize with respect 
to the number of node generated. A horizontal then corresponds to a linear 
speedup. As expected the line for the DACAPO example increases, so we do 
not have a linear speedup. The speedup looks better for the CP on the Beowulf 
example but since we do not have the time on one node (this could not complete 
due to memory shortage) it is hard to judge whether the work is approaching 
the work in one node or really is decreasing below that. The same is the case for 
the buscoupler and the PD example. Figure 5 shows the work for the CP and 
buscoupler examples. One interesting point to notice is that for six nodes without 
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the heuristic the buscoupler performs very poorly (we have no explanation for 
this behavior). 

The explanations we suggest for the super linear speedups we encounter on 
the Beowulf are the same as for the Sun Enterprise: access to a larger amount 
of local (cache) memory. 



4.3 Distribution Functions and Locality 

In most of the experiments, states are distributed evenly among nodes using the 
hash function from Uppaal. However, for small models we observed that some 
nodes explore twice as many states as others because some location vectors have 
more reachable symbolic states than others, which means that some nodes have 
more states allocated than others. Counting the number of different location 
vectors on the different nodes, the distribution again looks uniform. This effect 
does not show up in larger models. 

We ran experiments for different distribution functions: a function hashing 
on the discrete part of a state (DO), a function hashing on the complete state 
(Dl), a function hashing on the integer variables (D2), and a function hashing 
on every second location (D3). We also ran experiments for different settings of 
the state space reduction technique described in Section 3.3, where only states 
that are actually stored in the passed list are mapped to different nodes: storing 
all states (SO), storing non-committed or loop entry points (SI), and storing only 
loop entry points (S2). Table 5 shows for the buscoupler and the power-down 
models the percentage of states explored on the same node they were generated 
on. These experiments were run on the Sun Enterprise with 8 CPUs, but similar 
results were obtained using the Beowulf cluster. 



Table 5. Percent of locally explored states for different distribution and storage policies 
for the buscoupler model (left) and the power-down protocol (right) when verified on 
a Sun Enterprise using 8 nodes. 



Bus 


DO 


Dl 


D2 


D3 


SO 


14% 


n/a 


52% 


42% 


SI 


36% 


n/a 


60% 


58% 


S2 


55% 


n/a 


62% 


62% 



PD 


DO 


Dl 


D2 


D3 


SO 


4% 


n/a 


76% 


22% 


SI 


34% 


n/a 


76% 


48% 


S2 


60% 


n/a 


78% 


86% 



For the buscoupler with DO and SO we almost obtain the expected uniform 
distribution (100%/8 = 12.5%). This was not the case for the power-down model 
although the total load on the nodes was uniform. None of the Dl experiments 
terminated within a reasonable time frame. This was expected since much fewer 
inclusion checks can succeed with this distribution function and hence a much 
higher number of symbolic states will be generated. Both S1/S2 and D2/D3 
improve locality. What cannot be seen is that both SI and S2 increase the 
number of states generated (for the buscoupler to such an extend that S2 is 
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actually slower than SO). D2 is surprisingly uniform while increasing locality, 
but the load distribution of D3 was observed to be highly non-uniform, resulting 
in poor performance. For the buscoupler D2 and SI turned out to be the fastest 
combination. For the power-down model D2 and S2 turned out to be the fastest 
combination. 



4.4 Generating Shortest Traces 

For the buscoupler system we tried the version finding the shortest trace on four 
different properties (finding a particular state not generating the complete state 
space) on the Sun Enterprise. The speedups are displayed in Fig. 6. As for the 
DACAPO system the speedup for properties one and two suffer from too few 
states being explored. The speedup for properties three and four are much better 
but here more states are searched to find the state satisfying the property. So 
we can conclude that also the version finding shortest trace scales quite well, as 
long as sufficiently many states need to be generated. 

5 Conclusions 

This paper demonstrates the feasibility of distributed model checking of timed 
automata. A side effect of the distribution was an altered search order, which 
in turn increased the number of symbolic states generated when exploring the 
reachable state space. We have proposed explicit ordering of the states in the 
waiting list as an effective heuristic to improve the scalability of the approach. 
In addition we propose an algorithm for finding shortest traces that performs 
well in a distributed model checker. 

In several cases we obtained super linear speedups. We have suggested some 
explanations, but more work is needed to clarify the observed phenomena. Im- 
portantly, some of our results suggests possible improvements to the sequential 
state space exploration algorithm for timed automata. 
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Abstract. We study model checking problems for pushdown systems 
and linear time logics. We show that the global model checking pro- 
blem (computing the set of configurations, reachable or not, that violate 
the formula) can be solved in 0{g g g time and 0{g g ^ g g 
space, where g g and g g are the size of the pushdown system and 
the size of a Biichi automaton for the negation of the formula. The glo- 
bal model checking problem for reachable configurations can be solved in 
0{g g '^g g time and 0(p g *g g space. In the case of pushdown 
systems with constant number of control states (relevant for our applica- 
tion), the complexity becomes 0{g g g g time and 0{g g g g 
space and 0{g g ^g g time and 0{g g ^g g space, respectively. 
We show applications of these results in the area of program analysis 
and present some experimental results. 



1 Introduction 

Pushdown systems (PDSs) are pushdown automata seen under a different light: 
We are not interested in the languages they recognise, but in the transition 
system they generate. These are infinite transition systems having configurations 
of the form (control state, stack content) as states. 

PDSs have already been investigated by the verification community. Model 
checking algorithms for both linear and branching time logics have been propo- 
sed in [1,2,3,7,11]. The model checking problem for CTL and the mu-calculus 
is known to be DEXPTIME-complete even for a fixed formula [1,11]. On the 
contrary, the model checking problem for LTL or the linear time mu-calculus 
is polynomial in the size of the PDS [1,7]. This makes linear time logics par- 
ticularly interesting for PDSs. It must be observed, however, that the model 
checking problem for branching time logics is only exponential in the number of 
control states of the PDS; for a fixed number of states the algorithms of [2,11] 
are polynomial. 

Inspired by the work of Steffen and others on the connection between model 
checking and dataflow analysis (see for instance [9]), it has been recently observed 
that relevant dataflow problems for programs with procedures (so-called inter- 
procedural dataflow problems), as well as security problems for Java programs 
can be reduced to different variants of the model checking problem for PDSs 
and LTL [5,6,8]. Motivated by this application, we revisit the model checking 
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problem for linear time logics. We follow the symbolic approach of [1], in which 
infinite sets of configurations are finitely represented by multi-automata. We 
obtain an efficient implementation of the algorithm of [ 1 ], which was described 
there as ‘polynomial’, without further details. Our algorithm has the same time 
complexity and better space complexity than the algorithm of [7]. This better 
space complexity turns out to be important for our intended applications to 
dataflow analysis. 

The paper is structured as follows. Sections 2 and 3 contain basic definitions, 
and recall some results of [1]. The ‘abstract’ solution of [1] to the model-checking 
problem is described and refined in Sections 4 to 6 . In Section 7 we provide an 
efficient implementation for this solution. Applications and experimental results 
are presented in Section 9. 

All proofs and some constructions have been omitted due to lack of space. 
They can all be found in the technical report version of this paper [4] . 

2 Pushdown Systems and 7^-Automata 

A pushdown system is a triplet V = {P, P, A) where P is a finite set of control 
locations, P is a finite stack alphabet, and Z\ C (P x P) x (P x P*) is a finite 
set of transition rules. If ((q,j),(q',w)) G A then we write (( 7 , 7 ) ^ {q',w) (we 
reserve — >■ to denote the transition relations of finite automata) . 

Notice that pushdown systems have no input alphabet. We do not use them 
as language acceptors but are rather interested in the behaviours they generate. 

A configuration of P is a pair (p, w) where p G P is a control location and 
re G P* is a stack content. The set of all configurations is denoted by C. 

If {q, 7 ) {q' , w), then for every v G P* the configuration {q, jv) is an imme- 

diate predecessor of {q',wv), and {q',wv) an immediate successor of {q,jv). The 
reachability relation is the reflexive and transitive closure of the immediate 
successor relation; the transitive closure is denoted by =7. A run of P is a maxi- 
mal sequence of configurations such that for each two consecutive configurations 
CiCi+i, Ci+i is an immediate successor of Cj. 

The predecessor function pre : 2“' — >■ 2^ of P is defined as follows: c belongs to 
pre{C) if some immediate successor of c belongs to C. The reflexive and transitive 
closure of pre is denoted by pre*. Clearly, pre*{C) = {c G C | 3c' G C. c c'}. 
Similarly, we define post{C) as the set of immediate successors of elements in C 
and post* as the reflexive and transitive closure of post. 



2.1 P-Automata 

Given a pushdown system P = (P, P, A), we use so-called V -automata in order 
to represent sets of conflgurations of P. A P-automaton uses P as alphabet, 
and P as set of initial states (we consider automata with possibly many initial 
states). Formally, a P-automaton is an automaton A = {P,Q,S, P, F) where Q 
is the flnite set of states, ^CQxPxQis the set of transitions, P is the set of 
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initial states and F C Q the set of final states. We define the transition relation 
C Q X r* X Q as the smallest relation satisfying: 

— q q for every q G Q, 

— if (9, 7,<?0 G ^ then q^U-q', and 

— it q — >■ q and q — ^ q then q >■ q . 

All the automata used in this paper are P-automata, and so we drop the V 
from now on. An automaton accepts or recognises a configuration (p, w) if p — > q 
for some p G P, q G F. The set of configurations recognised by an automaton A 
is denoted by Conf{A). A set of configurations of V is regular it it is recognized 
by some automaton. 

Notation In the paper, we use the symbols p,p' ,p” etc., eventually with indices, 
to denote initial states of an automaton (i.e., the elements of P). Non-initial 
states are denoted by s, s', s" etc., and arbitrary states, initial or not, by q, q' , q” . 

3 Model-Checking Problems for Linear Time Logics 

In this section we define the problems we study, as well as the automata-theoretic 
approach. 

Let Prop be a finite set of atomic propositions, and let S = 2 ^’’°^’. It is well 
known that the semantics of properties expressed in linear time temporal logics 
like LTL or the linear-time p-calculus are w-regular sets over the alphabet F, and 
there exist well-known algorithms which construct Biichi automata recognizing 
these sets. This is all we need to know about these logics in this paper in order 
to give model-checking algorithms for pushdown systems. 

Let P = {P, F, A) be a pushdown system, and let A : (P x T) -G S he a 
labelling function, which intuitively associates to a pair (p, 7) the set of propo- 
sitions that are true of it. We extend this mapping to arbitrary configurations: 
(p, 7w) satisfies an atomic proposition if (p, 7) does.^ 

Given a formula p of such an w-regular logic we wish to solve these problems: 

— The global model-checking problem: compute the set of configurations, re- 
achable or not, that violate p. 

— The global model-checking problem for reachable configurations: compute 
the set of reachable configurations that violate p. 

Our solution to these problems uses the automata-theoretic approach. We 
start by constructing a Biichi automaton B = (F,Q, 6 i<lo,F) corresponding to 
the negation of p. The product of P and B yields a Biichi pushdown system 
BP = ((P X Q),F, A', G), where 

— {{p,q),l) ^ w) G A' it (p,-f) ^ (p',w), and (T c A((p,7)). 

— (p,q) GG it qG F. 



^ We could also define A: P ^ E, but our definition is more general, and it is also 
the one we need for our applications. 
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The global model checking problem reduces to the accepting run problem: 

Compute the set Ca of configurations c of BV such that BV has an 
accepting run starting from c (i.e., a run which visits infinitely often 
configurations with control locations in G). 

Notice that the emptiness problem of Biichi pushdown systems (whether the 
initial configuration has an accepting run) also reduces to the accepting run 
problem; it suffices to check if the initial configuration belongs to the set Ca- 
The following proposition characterises the configurations from which there are 
accepting runs. 

Definition 1. Let BV = {P, V, A, G) be a Biichi pushdown system. 

The relation between configurations of BV is defined as follows: c c' 
if c^ (g,u) c' for some configuration (g,u) with g £ G. 

The head of a transition rule (p, 7) ^ {p',w) is the configuration (p, 7). A 
head (p, 7) is repeating if there exists v £ P* such that (p, 7) (p, jv) . The 

sets of heads and repeating heads of BV are denoted by H and R, respectively. 



Proposition 1. [ 1 ] Let c be a configuration of a Biichi pushdown system BV — 
{P, P, A,G) and let RP* denote the set { {p,jw) | (p, 7) G i?, w £ P*}. BV has 
an accepting run starting from c if and only if c £ pre* (RP*) . 

Proposition 1 reduces the global model-checking problem to computing the 
set pre*{RP*); the global model-checking problem for reachable configurations 
reduces to computing post*{{c}) C\pre*{RP*) for a given initial configuration c. 

In the next sections we present a solution to these problems. We first recall 
the algorithm of [ 1 ] that computes pre*{C) for an arbitrary regular language G 
(observe that RP* is regular since R is finite) . Then we present a new algorithm 
for computing R, obtained by modifying the algorithm for pre*(C). We also 
present an algorithm for computing post*{G) which is needed to solve the model- 
checking problem for reachable configurations. 



4 Computing pre*{C) for a Regular Language C 

Our input is an automaton A accepting G. Without loss of generality, we assume 
that A has no transition leading to an initial state. We compute pre*{G) as 
the language accepted by an automaton Aprs obtained from A by means of 
a saturation procedure. The procedure adds new transitions to A, but no new 
states. New transitions are added according to the following saturation rule: 



If (p, 7) ^ {p'j w) and p' AIA q in the current automaton, 
add a transition (p, 7,(7). 
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= {(j5o,7o) ^ (Pl,7l7o), 
(Pi,7i) ^ (P2,727o), 
(P2,72) ^ (Po,7i). 
{Po,li) ^ (po,s) } 



7i "^0 




Fig. 1. The automata A (left) and Apre (right) 



Notice that all new transitions start at initial states. Let us illustrate the 
procedure by an example. Let V = {P,F, A) be a pushdown system with P = 
{P 0 tP 1 tP 2 \ and A as shown in in the left half of Figure 1. Let A be the automaton 
that accepts the set C = {(po, 7 o 7 o)}> also shown in the figure. The result of the 
algorithm is shown in the right half of Figure 1. 

The saturation procedure eventually reaches a fixpoint because the number 
of possible new transitions is finite. Correctness was proved in [1]. 



5 Computing the Set R of Repeating Heads 

We provide an algorithm more efficient than that of [1]. The problem of finding 
the repeating heads in a Biichi pushdown system is reduced to a graph-theoretic 
problem. More precisely, given a Biichi pushdown system BP = {P, P, A,G) 
we construct a head reachability graph Q = {{P x P),E) whose nodes are the 
heads of BP. The set of edges E C (P x P) x {0, 1} x {P x P) generates the 
reachability relation between heads. Define G{p) = 1 if p G G and G(p) = 0 
otherwise. E consists of exactly the following edges: 

If (p, 7 ) ■-)> (p",ui 7 'u 2 ) and (p",ui) ^ {p',s), then ((p, 7 ),G(p), (p',Y)) G E. 

If, moreover, (p",ui) (p',e), then ((p, 7 ), 1, (p', 7 ')) G E. 

The reachability relation — >■ C (P x T) x {0, 1} x (P x P) is then defined as the 
smallest relation satisfying 

~ (P,7) -^(pa) for every (p,y) G (P x P). 

- If ((P,7),b,(p',Y)) G E, then (p, 7) - 4 (p', 7'). 

- If ((P,7),f>,(p',Y)) G E and (p',Y) ~^(p",7"), then (p, 7) -^ff^(p", 7")- 

Once the graph is constructed, P can be computed by exploiting the fact that 
some head (p, 7 ) is repeating if and only if (p, 7 ) is part of a strongly connected 
component of Q which has an internal 1-labelled edge. The instances for which 
(p,v) =Y (p',s) holds can be found with a small modification of the algorithm 
for pre*. 
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Fig. 2. Left: The graph Q. Right: Apost 



Let BV = {P, r, A, G) be a Biichi pushdown system with P, P and Z\ as in 
the previous example and G = {p2}- The left part of Figure 2 shows the graph Q. 



6 Computing post*{C) for a Regular Set C 



We provide a solution for the case in which each transition rule (p, 7) {p' , w) 

of A satisfies |t(;| < 2 . This restriction is not essential; our solution can easily 
be extended to the general case. Moreover, any pushdown system can be trans- 
formed into an equivalent one in this form, and the pushdown systems in the 
application discussed in Section 9 directly satisfy this condition. 

Our input is an automaton A accepting G. Without loss of generality, we 
assume that A has no transition leading to an initial state. We compute post*{G) 
as the language accepted by an automaton Apost with e-moves. We denote the 
relation (-^)*-^(-^)* by =^. Apost is obtained from A in two stages: 

— Add to M a new state r for each transition rule r G A of the form (p, 7) ^ 
(p',7'7"), and a transition (p',j',r). 

— Add new transitions to A according to the following saturation rules: 



If (p, 7) (p', e) G A and p q in the current automaton, 

add a transition (p', e, q). 

If (p, 7) (p', 7') G A and p q in the current automaton, 

add a transition (p' ,j',q). 

If r = (p, 7) ^ {p' a'i") G ^ &nd p ^7 q in the current automaton, 
add a transition (r,j'',q). 



Consider again the pushdown system P and the automaton A from Figure I. 
Then the automaton shown in the right part of Figure 2 is the result of the 
algorithm above and accepts posC({(po, 7o7o)})- 
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Algorithm 1 

Input: a pushdown system V = (P, P, A) in normal form; 

a P- Automaton A = (P, Q, 5 , P, F) without transitions into P 
Output: the set of transitions of Apre 

1 rel <— 0; trans <— (5; A <—0; 

2 for all (p, 7 ) ^ {p ,e) G A do trans trans U {(p, 7 ,p )}; 

3 while trans ^ 0 do 

4 pop t = (?, 7 , g ) from trans; 

5 if t ^ rel then 

6 rel rel U {t}; 

7 for all (pi, 7 i) ^ (g, 7 ) € (Z\ U Z\ ) do 

8 trans trans U {(pi, 71 , g )}; 

9 for all (pi,7i) ^ (9,772) € A do 

10 A ^ A U {(pi,7i) ^ (g ,72)}; 

11 for all (9 ,72,9 ) G rel do 

12 trans <— trans U { (pi , 71 , (j )}; 

13 return rel 

7 Efficient Algorithms 

In this section we present efficient implementations of the abstract algorithms 
given in sections 4 through 6. We restrict ourselves to pushdown systems which 
satisfy |w| < 2 for every rule (p, 7) {p',w); any pushdown system can be put 

into such a normal form with linear size increase. 

7.1 Computing pre* (C) 

Given an automaton A accepting the set of configurations C, we compute 
pre*{C) by constructing the automaton Apre ■ 

Algorithm 1 computes the transitions of Apre , implementing the saturation 
rule from section 4. The sets rel and trans contain the transitions that are known 
to belong to Apre ; t'el contains the transitions that have already been examined. 
No transition is examined more than once. 

The idea of the algorithm is to avoid unnecessary operations. When we have 
a rule {p, 7) {p', 7' 7"), we look out for pairs of transitions ti = {p' , 7', q') and 

t2 = {q', 7", q") (where q', q" are arbitrary states) so that we may insert (p, 7, q") 
- but we don’t know in which order such transitions appear in trans. If every 
time we see a transition like t2 we check the existence of G, many checks might 
be negative and waste time to no avail. However, once we see t\ we know that all 
subsequent transitions {q',j",q") must lead to (p, 7, 9"). It so happens that the 
introduction of an extra rule (p, 7) {q', 7") is enough to take care of just these 

cases. We collect these extra rules in a set called A'; this notation should make 
it clear that the pushdown system itself is not changed. A' is merely needed for 
the computation and can be thrown away afterwards. 

For a better illustration, consider again the example shown in Figure 1. 

The initialisation phase evaluates the e-rules and adds (po,7i,Po)- When 
the latter is taken from trans, the rule (p2,72) ^ (po,7i) is evaluated and 
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(p2, 72 jPo) is added. This, in combination with {po, 70, si) and the rule (pi, 71) ^ 
(P2,727o), leads to (pi,7i,si), and A' now contains (pi,7i) ^ (po,7o)- We now 
have Pi Si S2, so the next step adds {po, 70, S2), and A' is extended by 
(Po,7o) (si,7o)- Because of A', {po, 70,82) leads to (pi, 71,52). Finally, A' is 

extended by {po,7o) (82,70}, but no other transitions can be added and the 

algorithm terminates. 

Theorem 1. Let V = {P, P, A) be a pushdown system and A = {P, Q, 6, P, F) 
be an automaton. There exists an automaton Apre recognising pre* {Conf {A)) . 
Moreover, Apre can be constructed in O^UqU/i) time and 0 {nQnA Png) space, 
where nq = \Q\, ns = |<5|, and ua = |^|. 

Observe that a naive implementation of the abstract procedure of section 4 
leads to an 0 {n“^n^) time and 0 {n'pnX) space algorithm, where n-p = |P| + |Z\|, 
and nji = \Q\ + |i5|. 

7.2 Computing the Set of Repeating Heads 

Given a Biichi pushdown system (P, P, A, G) we want to compute the set R 
introduced in section 3, i.e. the set of transition heads (p, 7) that satisfy (p, 7) 
(p,7v) for some v € P*. 

Algorithm 2 runs in two phases. In the first phase, the cases for which 
(p,w) =k (p',e) holds are computed. To this end, we employ the algorithm for 
pre* on the set { (p, e) | p G P }. Then every resulting transition (p, 7,p') signifies 
that (p, 7) =k (p',e) holds. 

However, we also need the information whether (p, 7) (p',s) holds. To 

this end, we enrich the automaton’s alphabet; instead of transitions of the form 
(p, 7,p') we now have transitions (p, [7, b],p') where & is a boolean. The meaning 
of a transition (p, [7, l],p') should be that (p, 7) =7> (p',s). 

The second phase of the algorithm constructs the graph Q using the results of 
the first phase. Finally, Tarjan’s algorithm [10] can be used to find the strongly 
connected components of Q and thus to determine the repeating heads. 

In the example from Figure 2, the components of G are {(po) 7o)> (pij 7i)}> 
{(po,7i)}, and {(p2,72)}- Of these, the first one has an internal 1-edge, meaning 
that (po,7o) and (pi,7i) are the repeating heads of this example. 

Theorem 2. Let BP = (P, P, A, G) be a Biichi pushdown system. The set of 
repeating heads R can be computed in 0 {n\,nA) time and 0 {npnA) space, where 
np = jPj and ua = |^|- 

A direct implementation of the procedure of [1] for computing the repeating 
heads leads to 0 {n^p) time and 0 {n^p) space, where nsv = jPj + j^j. 

7.3 Computing post*{C) 

Given a regular set of configurations G, we want to compute post*{C), i.e. the 
set of successors of C. Without loss of generality, we assume that A has no 
£-transitions. 




240 J. Esparza et al. 



Algorithm 2 

Input: a Biichi pushdown system BV = (P, P, A, G) in normal form 
Output: the set of repeating heads in BV 

1 rel <r- 0; trans 0; Z\ -^0; 

2 for all (p, 7) ^ (p , e) e Zi do 

3 trans trans U {(p, [7, G(p)],p )}; 

4 while trans ^ 0 do 

5 pop t = (p, [7, &],p ) from trans; 

6 if t ^ re/ then 

7 rel <r- rel U {t}; 

8 for all (pi,7i) ^ (p, 7) € do 

9 trans trans U {(pi, [71, &V G(pi)],p )}; 

10 for all (pi,7i) (p, 7) € 21 do 

11 trans trans U {(pi, [71, bV b],p )}; 

12 for all (pi,7i) ^ (p, 772) G 2l do 

13 Zl ^ Z\ U {(pi,7i) ,72)}; 

14 for all (p , [72, & ],p ) (z rel do 

15 trans <— trans U { (pi , [71 , 6 V 6 V G(pi)],p )}; 

16 

17 P ^ 0 ; P ^ 0 ; 

18 for all (p,7) ^ (p ,7 ) G Zl do P ^ P U {((p, 7), G(p), (p ,7 ))}; 

19 for all (p,7) ^ (p ,7 ) G Zl do P ^ P U {((p, 7), fe, (p , 7 ))}; 

20 for all (p,7) ^(p,77 )GZldoP^ P U {((p, 7), G(p), (p ,7 ))}; 

21 hnd strongly connected components in C? = ((P x P), P); 

22 for all components G do 

23 if G has a 1 -edge then P P U G; 

24 return P 



Algorithm 3 calculates the transitions of Apost > implementing the saturation 
rule from section 6. The approach is in some ways similar to the solution for pre*; 
again we use trans and rel to store the transitions that we need to examine. Note 
that transitions from states outside of P go directly to rel since these states 
cannot occur in rules. 

The algorithm is very straightforward. We start by including the transitions 
of A; then, for every transition that is known to belong to Apost , we find its 
successors. A noteworthy difference to the algorithm in section 6 is the treatment 
of e-moves: e-transitions are eliminated and simulated with non-e-transitions; we 
maintain the sets eps{q) for every state q with the meaning that whenever there 
should be an e-transition going from p to q, eps{q) contains p. 

Again, consider the example in Figure 2 . In that example, mi is the node as- 
sociated with the rule (poj 7o) ^ (pi) 7i7o)) and m2 is associated with (pi, 71) ^ 
(P2,727 o)- The transitions (pi,7i,mi) and (mi,7o,si) are a consequence of 
(po,70jSi); the former leads to (p2, 72,^22) and (m2, 70, mi) and, in turn, to 
(Po, 711^12)- Because of (po,7i) ^ (P0;Z), we now need to simulate an £-move 
from po to m2. This is done by making copies of all the transitions that leave m2; 
in this example, (m2, 70, mi) is copied and changed to (po)70)nii). The latter 
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Algorithm 3 

Input: a pushdown system V = (P, P, A) in normal form; 

a P- Automaton A = (P, Q, 5, P, F) without transitions into P 
Output: the automaton Apoat 

1 trans ■(— 5 Ci {P x P x Q)-, 

2 rel ■(— 5\ trans; Q ■(— Q; F ■(— F; 

3 for all r = (p, 7) ^ (p ,7172) € A do 

4 Q t— Q U 

5 trans t— trans U {{p , 71 , Qr)}; 

6 for all q G Q do eps{q) t— 0; 

7 while trans 7^ 0 do 

8 pop t = {p, 7, g) from trans; 

9 if t ^ rel then 

10 rel t— rel U {t}; 

11 for all (p, 7 ) ^ (p ,e) € A do 

12 if P ^ 6ps{q) then 

13 eps{q) eps{q)VJ{p}; 

14 for all {q,^ ,q ) G rel do 

15 trans G- trans U {(p ,7 ,? )}; 

16 ii q G F then P t— P U {p }; 

17 for all (p, 7) ^ (p ,71) G A do 

18 trans g- trans U {(p , 71 , g )}; 

19 for all r = (p, 7) ^ (p , 7172) G A do 

20 rei re/U {(gr, 72 ,g)}; 

21 for all p G eps{qr) do 

22 trans G- trans U { (p , 72 , g) } ; 

23 return (P, Q , rel, P, F ) 



finally leads to (pi,7i,mi) and (wi, 70, mi). Figure 3 shows the result, similar 
to Figure 2 but with the ^-transition resolved. 



Theorem 3. Let V = {P, F, A) be a pushdown system, and A = (P, Q, 6, P, F) 
be an automaton. There exists an automaton Apost reeognising post* {Conf {A)) . 
Moreover, Apost can be eonstructed in 0{npnA{nQ + ua) + npus) time and 
space, where np = |P|, ua = \A\, uq = \Q\, and ns = |<5|. 

In [7] the same problem was considered (with different restrictions on the 
rules in the pushdown system). The complexity of the post* computation for the 
initial configuration was given as 0{n^) where np translates to np -I- ua- An 
extension to compute post*{C) for arbitrary regular sets C is also proposed. The 
different restrictions on the pushdown rules make a more detailed comparison 
difficult, but it is safe to say that our algorithm is at least as good as the one 
in [7]. Also, we give an explicit bound for the computation of post* for arbitrary 
regular sets of configurations. 
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Fig. 3. Apost as computed by Algorithm 3. 



8 The Mo del- Checking Problem 

Using the results from the previous section we can now compute the complexity 
of the problems presented in section 3. The following steps are necessary to solve 
the global model-checking problem for a given formula tp: 

— Construct the Biichi automaton B = {S, Q, S, qo, F) corresponding to -'(p. 

— Compute BV as the product of B and the pushdown system V = {P,F, A). 

— Compute the set of repeating heads R of BP. 

— Construct an automaton A accepting RF*. A has one final state r and 
contains transitions (p, 7 , r) for every (p, 7 ) G and (r, 7 , r) for every 7 G T. 

— Compute Apre ■ A configuration (p,w) violates p exactly if {{p,qo),w) is 
accepted by Apre ■ 

The dominant factors in these computations are the time needed to compute 
the repeating heads, and the space needed to store Apre ■ 

Theorem 4. Let g-p denote the size ofV and gs the size ofB. The global model- 
checking problem can be solved in time and 0{gp^gs'^) space. 

In [7] an algorithm is presented for deciding if the initial configuration satisfies 
a given LTL property. (The problem of obtaining a representation for the set of 
configurations violating the property is not discussed.) The algorithm takes cubic 
time but also cubic space in the size of the pushdown system. More precisely, 
it is based on a saturation routine which requires 0{gp^gs^) space. Observe 
that the space consumption is 0, and not O. Our solution implies therefore 
an improvement in the space complexity without losses in time. To solve the 
problem for reachable configurations we need these additional steps: 

— Rename the states {p, qo) into p for all p G P and take P as the new set of 
initial states. 

— Compute post* for the initial configuration of P. 

— Compute the intersection of Apre and Apost ■ The resulting automaton Ai 
accepts the set of reachable configurations violating (p. 

Theorem 5. The global model- checking problem for reachable configurations 
can be solved in 0{gp‘^gB^) time and 0{gv'^gB^) space. 
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9 Application 

As an application, we use pushdown systems to model sequential programs with 
procedures (written in C or Java, for instance). We concentrate on the con- 
trol flow and abstract away information about data. The model establishes a 
relation between the control states of a program and the configurations of the 
corresponding pushdown system. 

The model is constructed in two steps. In the first step, we represent the 
program by a system of flow graphs, one for each procedure. The nodes of a flow 
graph correspond to control points in the procedure, and its edges are annotated 
with statements, e.g. calls to other procedures. Control flow is interpreted non- 
deterministically since we abstract from the values of variables. 

Given a system of flow graphs with a set N of control points, we construct a 
pushdown system with N as its stack alphabet. More precisely, a configuration 
{p,nw), w G N* represents the situation that execution is currently at control 
point n where w represents the return addresses of the calling procedures. Pus- 
hdown systems of this kind need only one single control state (called p in the 
following). The transition rules of such a pushdown system are: 

— {p, n) ^ (p, n') if control passes from n to n' without a procedure call. 

— (p, n) ^ (p, fon') if an edge between point n and n' contains a call to 
procedure /, assuming that /o is /’s entry point, n' can be seen as the 
return address of that call. 

— (p, n) ^ (p, s) if an edge leaving n contains a return statement. 

Let us examine how the special structure of these systems affects the com- 
plexity of the model-checking problem. Under the assumption that the number 
of control states is one (or, more generally, constant), the number of states in 
the Biichi pushdown system BV only depends on the formula, and we get the 
following results: 

Theorem 6. If the number of control states in V is constant, the global model- 
checking problem can be solved in O(g-pgB^) time and 0{g-pgB^) space, and the 
problem for reachable configurations in 0{g-p^gB^) time and 0{g-p‘^gB‘^) space. 



9.1 A Small Example 

As an example, consider the program in Figure 4. This program controls a plot- 
ter, creating random bar graphs via the commands go-up, go-right, and go-down. 
Among the correctness properties is the requirement that an upward movement 
should never be immediately followed by a downward movement and vice versa 
which we shall verify in the following. 

The left side of Figure 5 displays the set of flow graphs created from the 
original program. Calls to external functions are treated as ordinary statements. 
(In this example we assume that the go... functions are external.) The procedure 
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main ends in an infinite loop which ensures that all executions are infinite. The 
right side shows the resulting pushdown system. The initial configuration is 
{p,maino). The desired properties can be expressed as follows: 

G{up — >■ {-'down U right)) and G{down — >■ {-<up U right)) 

where the atomic proposition up is true of configurations (p, m^w) and (p, S2w), 
i.e. those that correspond to program points in which gojup will be the next 
statement. Similarly, down is true of configurations with mg or 55 as the topmost 
stack symbol, and right is true of mi. Analysis of the program with our methods 
yields that both properties are fulfilled. 

9.2 Experimental Results 

Apart from the example, we solved the global model-checking problem on a 
series of randomly generated flow graphs. These flow graphs model programs 
with procedures. The structure of each statement was decided randomly; the 
average proportion of sequences, branches and loops was 0.6 : 0.2 : 0.2 (these 
numbers were taken from the literature). After generating one flow graph for each 
procedure we connected them by inserting procedure calls, making sure that each 
procedure was indeed reachable from the start. The formulas we checked were 
of the form G{n — >■ Fn'), where n and n' were random control states. 

Table 1 lists the execution times in seconds for programs with an average 
of 20 resp. 40 lines per procedure. One fifth of the statements in these programs 
contained a procedure call. Results are given for programs with recursive and 
mutual procedure calls. The table lists the times needed to compute the sets of 
repeating heads, pre* , and the total time for the model-checking which includes 
several other tasks. In fact, in our experiments the majority of time was spent 
reading the pushdown system and computing the product with a Biichi auto- 
maton. Also, memory usage is given. All computations were carried out on an 
Ultrasparc 60 with sufficient amount of memory. 

Obviously, these experiments can only give a rough impression of what the 
execution times would be like when analysing real programs. However, these 
experiments already constitute ‘stress tests’, i.e. their construction is based on 



void m() { 

double d = drand48(); 
if (d < 0.66) { 

s(); go_right(); 
if (d < 0.33) mO ; 

} else { 

go_up() ; m() ; go_down() ; 

} 

} 



void s() { 

if (drand48() < 0.5) return; 
go_up() ; m() ; go_down() ; 

} 

mainO { 

srand48 (time (NULL) ) ; 
s() ; 

} 



Fig. 4. An example program. 
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Fig. 5. Flowgraph of the program in Figure 4 (left) and associated PDS (right) 

certain exaggerated assumptions which affect execution times negatively. For 
instance, the number of procedure calls is very high, and with mutual calls we 
allowed every procedure to call every other procedure (which greatly increases 
the time needed to find repeating heads). We have run preliminary experiments 
with three real programs of ten to thirty thousand lines, and the time for the 
model-checking was never more than a few seconds even for the largest program. 

10 Conclusions 

We have presented detailed algorithms for model-checking linear time logics 
on pushdown systems. The global model-checking problem can be solved in 
0{gv^9B^) time and 0{g'p‘^gs^) space. In the case of pushdown systems with one 
single control state the problem can be solved in 0{g'pgs^) time and 0{g'pgs^) 
space. Our results improve on [7], where the model-checking problem (i.e., deci- 
ding if an initial configuration satisfies a property, without computing the set of 
configurations violating the property) was solved in O(n^) time but also O(n^) 
space in the size of the pushdown system. 

Our work needs to be carefully compared to that on branching time logics. 
For arbitrary pushdown systems, linear time logics can be checked in polynomial 
time in the size of the system, while checking CTL provably requires exponential 
time. However, in the case of pushdown systems with one single control state, a 
modal mu-calculus formula of alternation depth k can be checked in time 0{n^) 
[3,11], where n is the size of the system. In this case, linear time formulas can 
be checked in 0(ri) time, and so we obtain linear complexity in the size of the 
system for both linear time logics and for the alternation-free mu-calculus. 

The information provided by the branching-time and linear-time algorithms 
is different. Walukiewicz’s algorithm [11] only says whether the initial configu- 
ration satisfies the property or not. The algorithm of Burkart and Steffen [2,3] 
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avg. 20 lines/procedure avg. 40 lines/procedure 
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recursive procedure calls 






1000 


0.05 


0.02 


0.23 


0.91 M 


0.05 


0.02 


0.20 


0.84 M 


2000 


0.12 


0.05 


0.46 


1.82 M 


0.14 


0.05 


0.48 


1.84 M 


5000 


0.36 


0.13 


1.23 


4.46 M 


0.34 


0.14 


1.20 


4.29M 


10000 


0.76 


0.26 


2.55 


8.79 M 


0.74 


0.30 


2.52 


8.62 M 


20000 


1.64 


0.55 
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17.56 M 
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0.64 


5.57 
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0.06 


0.03 
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0.97M 
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0.03 
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0.90 M 
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0.49 
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0.14 
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9.68 M 


0.95 


0.39 
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20000 


2.10 


0.81 


6.21 


19.27M 


2.08 


0.84 


6.15 


18.93 M 



Table 1. Results for programs with recursive and mutual procedure calls 



returns a set of predicate transformers, one for each stack symbol; the predicate 
transformer for the symbol X shows which formulas hold for a stack content Xa 
as a function of the formulas that hold in a? Essentially, this allows to deter- 
mine for each symbol X if there exists some stack content of the form Xa that 
violates the formula, but doesn’t tell which these stack contents are. Finally, our 
algorithm returns a finite representation of the set of configurations that violate 
the formula. 

Whether one needs all the information provided by our algorithm or not 
depends on the application. For certain dataflow analysis problems we wish to 
compute the set of control locations p such that some reachable configuration 
(p, w) violates the property. This information can be efficiently computed by the 
algorithm of [2,3]. Sometimes we may need more. For instance, in [8] an approach 
is presented to the verification of control flow based security properties. Systems 
are modelled by pushdown automata, and security properties as properties that 
all reachable configurations should satisfy. It is necessary to determine which 
configurations violate the security property to modify the system accordingly. 
The model checking algorithm of [2,3] seems to be inadequate in this case. It may 
be possible to modify the algorithm in order to obtain an automata representa- 
tion of the configurations that violate the formula. This is surely an interesting 
research question. 



^ Not for arbitrary formulas, but only for those in the closure of the original formula 
to be checked. 
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Abstract. We present an algorithm to generate small Biichi automata for LTL 
formulae. We describe a heuristic approach consisting of three phases: rewriting of 
the formula, an optimized translation procedure, and simplification of the resulting 
automaton. We present a translation procedure that is optimal within a certain class 
of translation procedures. The simplification algorithm can be used for Biichi 
automata in general. It reduces the number of states and transitions, as well as 
the number and size of the accepting sets — possibly reducing the strength of the 
resulting automaton. This leads to more efficient model checking of linear-time 
logic formulae. We compare our method to previous work, and show that it is 
significantly more efficient for both random formulae, and formulae in common 
use and from the literature. 



1 Introduction 

The standard approach to LTL model checking [18,14] consists of translating the negation 
of a given LTL formula into a Biichi automaton, and checking the product of the property 
automaton and the model for language emptiness. The quality of the translation affects 
the resources required by the model checking experiment. This motivates the search for 
algorithms that generate efficient automata, i.e., automata with few states, few transitions, 
and simple acceptance conditions. 

The initial approaches to translation of LTL formulae were not designed to yield 
small automata. The process of [18] always yields the worst-case result of 0(2") states, 
where n is the length of the formula. The approach of [11] was the first to produce 
automata that were not necessarily of worst-case size. In [9], a more efficient algorithm 
was proposed that works on-the-fly. A further improvement over this algorithm, based 
on syntactic simplification, was discussed in [5]. 

We present an approach to the generation of Biichi automata from LTL formulae that 
extends the work of [9,5] in three ways: 

1 . It applies rewriting rules to the formula before translation. 

2. It reduces the number of states generated by the translation by applying boolean 
optimization techniques. 

3. It simplifies both the transition structure and the acceptance conditions of the resul- 
ting Biichi automaton. 

* This work was supported in part by SRC contract 98-DJ-620 and NSE grant CCR-99-7 1195. 
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The last phase of our algorithm is independent of LTL, and can be applied whenever one 
is interested in simplifying a Biichi automaton. Our algorithm is not designed to work 
on-the-fly, since the automata generated by the algorithm are typically much smaller 
than the model. 

Both explicit and implicit model checking algorithms benefit from the simplification 
of the Biichi automata. Explicit techniques check emptiness of the product automata in 
time proportional to the size of the property automaton. Symbolic algorithms need, in 
general, a number of symbolic operations that is quadratic in the size of the property 
automaton [4,12]. We can hence expect appreciable speedups in emptiness checks using 
either technique if we can reduce the size of the automaton significantly. 

The strength of a Biichi automaton [13,2] relates to the complexity of the proce- 
dure required to symbolically model check the corresponding property. For a strong 
automaton, an emptiness check requires the computation of a /i-calculus formula of 
alternation depth 2, which takes a number of preimage computations quadratic in the 
size of the automaton. A weak automaton requires an alternation-free greatest fixpoint 
computation, and hence only linearly many preimage computations. Finally, a terminal 
automaton only requires reachability analysis, and is therefore amenable to on-the-fly 
model checking. Our procedure tends to produce results of lesser strength than results 
of previous algorithms. 

We compare our technique to those of [9] and [5]. The comparison is based on both 
random formulae, and a set of formulae that are either in common use or found in the 
literature. In both cases, it is significantly more efficient in terms of number of states, 
transitions, acceptance conditions, and strength of the resulting automaton. 

Etessami and Holzmann [8] have independently developed a similar approach. In 
their approach, like ours, rewriting is followed by translation and simulation-based op- 
timization. Etessami and Holzmann perform rewriting using a set of rules that is incom- 
parable to ours. They do not expand on the translation stage. In the minimization stage 
backward simulation is not employed. Their technique is geared towards non-generalized 
automata. 

The flow of this paper is as follows. Section 2 presents the preliminaries. Section 3 
covers rewriting of the LTL formulae. Section 4 describes how boolean optimization 
can be used to make the translation into automata more efficient. Then, Sections 5 and 6 
describe the simplification of the automaton by deleting arcs and states, and pruning 
acceptance conditions. Finally, Section 7 presents the experimental results, and Section 8 
concludes the paper. 



2 Preliminaries 

There are several variants of Biichi automata. We adopt automata with multiple accep- 
tance conditions and with labels on the states (as opposed to labels on the arcs) as in 

[9]. 

Definition 1. A labeled, generalized Biichi automaton is a six-tuple 



A={Q,Qo,5,T,D,C), 
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where Q is the finite set of states, Qo C Q is the set of initial states, 5 : Q ^ 2^ is 
the transition relation, T 2^ is the set of acceptance conditions (or fair sets), D is a 
finite domain, and C : Q ^ 2^ is the labeling function. 

A run of A is an infinite sequence p = po, pi, . . . over Q, such that po G Qg, and 
for all i > 0, pi+i € S(pi). A run p is accepting if for each Fi & T there exists qj € Fi 
that appears infinitely often in p. 

The automaton accepts an infinite word a = ao,ai, .. . in if there exists an 
accepting run p such that, for all z > 0, (7^ G ^{pi)- The language of A denoted by 
L(A), is the subset of accepted by A. 

We write A'^ for the labeled, generalized Biichi automaton {Q, {g}, 6, T , D, C). 

Notice that with labels on the states we must allow multiple initial states. Multiple 
acceptance conditions, although not strictly necessary, simplify the translation. We will 
refer to a labeled generalized Biichi automaton simply as a Biichi automaton. 

The temporal logic LTL is obtained from propositional logic by adding three temporal 
operators: U (until), R (releases, the dual of U ), and X (next). The familiar G and F 
are defined as abbreviations, as are T (true) and F (false). Translation of an LTL formula 
p into a Biichi automaton is accomplished by application of expansion rules also known 
as tableau rules: 

V'l U f/'2 = V'2 V (fi A X (z/>i U z/>2)) , V'l R V'2 = V'2 A (f/'l V X (tpi R 'tp2)) ■ 

The rules are applied to p until the resulting expression is a propositional formula in 
terms of elementary subformulae of p. An elementary formula is a constant, an atomic 
proposition, or a formula starting with X . The expanded formula, put in disjunctive 
normal form (DNF), is an elementary cover of p. Each term of the cover identifies a 
state of the automaton. The atomic propositions and their negations in the term define the 
label of the state, that is, the conditions that the input word must satisfy in that state. The 
remaining elementary subformulae of the term form the next part of the term; they are 
LTL formulae that identify the obligations that must be fulfilled to obtain an accepting 
run; they determine the transitions out of the state as well as the acceptance conditions. 

The expansion process is applied to the next part of each state, creating new covers 
until no new obligations are produced. In this way, a closed set of elementary covers is 
obtained. The set is closed in the sense that there is an elementary cover in the set for the 
next part of each term of each cover in the set. The automaton is obtained by connecting 
each state to the states in the cover for its next part. The states in the elementary cover 
of p are the initial states. Acceptance conditions are added to the automaton for each 
elementary subformula of the form X (i/)i U z/) 2 )- The acceptance condition contains all 
the states s such that the label of s does not imply U ip 2 or the label of s implies ip 2 ■ 

3 Rewriting the Formula 

The first step towards an efficient translation is rewriting, a cheap, simple, and effective 
way to minimize result of the translation. Prior to generation of the Biichi automaton, a 
formula is put in positive normal form. Then we use the following identities and their 
duals to rewrite the formula, always replacing the left-hand side by the right-hand side. 
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ip < -'tp ((p A t/)) = F 
(Xp) U (XiP) = X(pUip) 

(p R tp) A (p R r) = p R (ip A r) 
(p R r) V (ip R r) = (p V Ip) R r 
(Xp)A(XiP)=X(pAip) 

XT = T 
pDF = F 

p<ip^(pDip) = ip 

-lip < p ^ (p \J Ip) = (T U Ip) 



GFp\/GFip = GF(p\/ip) 
FXp = XFp 

p<ip^p\J(ipGr) = ipGr 
GGF p = GF p 
FGFp=GFp 
XGFp = GFp 
F(pAGFiP) = (F(p) A (GFt/>) 
G (p\/ GF iP) = (Gp) \/ (GF iP) 
X(pAGFiP) = (Xp)A(GFiP) 
X(pVGFiP) = (Xp)v(GFiP) 



The rewriting rules have been chosen to eliminate redundancies and to reduce the size 
of the resulting automaton. Checking forp<ip is hard in general. Hence, we just look 
for simple cases that can be detected by purely syntactic means. We use the following 
set of rules and their duals. 

p<p 

p<T 

(p <ip) A(p <x) ^ < O’ /^X) 

(v> < x) X (ip < x) => {v> i>) < X 

The first two rules are the terminal cases of a recursive procedure in which one applies 
the remaining ones to decompose the problem. 

Example 1. The rewriting rules transform the formula (Xp U Xq) V —‘X(p U q) into 
T. The automaton produced by our algorithm when rewriting is disabled is shown in 
Figure 1 . The optimal automaton obviously has only one state with a self-loop. 



(vs < x) A (V> < x) ^ (v? U < X 

(v<x)/\{P’<s)^(pGip)<(xG s) 




Fig. 1. Sub-optimal automaton for (X p U X g) V ->X (p\J q). Each node is annotated with its label, 
(first line), and the fair sets to which it belongs (second line). In the label, an overline indicates 
negation, and concatenation indicates conjunction. 



4 Reducing the Number of States via Boolean Optimization 



The LTL formula produced by rewriting is the input to the second phase of the procedure, 
which produces a closed set of elementary covers and constructs the Biichi automaton 
from it. We refer to the covers in this set as the elementary covers of the automaton. An 




252 



F. Somenzi and R. Bloem 



LTL formula has infinitely many elementary covers; which one is chosen affects the size 
of the resulting automaton by directly affecting what states are added to the automaton, 
and by determining what covers will belong to the closed set. 

By regarding each elementary formula as a literal, one can apply boolean opti- 
mization techniques to minimize the cost of the elementary covers. (In this case, the 
implication tests required to compute the fair sets must also be carried out with boolean 
techniques.) However, the best result is not always obtained by computing a minimum 
cost cover for each formula. The following example illustrates this case. 

Example 2. Consider translating the LTL formula p = F p A F -ip into a Biichi automa- 
ton. The algorithm of [5] produces the automaton on the left of Figure 2. Applying the 





Fig. 2. Biichi automata for F p A F -ip: without (left) and with (right) boolean optimization. 



tableau rules to p yields (p A X F -ip) V(-ipAXFp)V(XFpAXF -ip) . The three terms 
of this cover correspond to the three initial states of the left automaton of Figure 2 (ni, 
ri 2 , and n^). The third term of the disjunctive normal form is the consensus of the first 
two; hence, it can be dropped. As a result. State nl is removed. Applying the tableau 
rules to F p and F -ip yields Fp = pVXFp and F -ip = -ip V X F -ip, from which the 
remaining five states of the automaton on the left of Figure 2 are produced. Notice, ho- 
wever, that the alternative expansion Fp = pV(-'pAXFp),F-'p=-ipV(pAXF -ip) , 
though it requires more literals, prevents the creation of States and n^, leading to the 
automaton on the right of Figure 2. 

We can formulate the choice of the optimum covers as a 0-1 ILP as follows. We define 
one set of formulae a set of terms r{ipQ) such that the automaton for will 

be obtained from elementary covers for a subset of the formulae in <P{ipo), and will have 
states corresponding to a subset of r{'ipo). We then impose constraints that guarantee 
that the automaton will recognize exactly the models of the LTL formula. 

Definition 2. For an LTL formula p, let E(tp) be the expansion of p in terms of ele- 
mentary subformulae. Let M(ip) be the set of minterms of E{p), and P{p) be the 
set of prime implicants of E(p). Finally, for 7 a term of an elementary cover, let 
= /\{f : Xf is a literal 0 / 7 } be its next part. 
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The sets <P{tpo) and T(t/>o) are the smallest sets that satisfy the following constraints: 

tAo e , 'P Q iZ'(V'o) P{h P) Q r{'<po) , 

7 G r(t/;o) ^ iV(7) G <?(t/’o) • 

We associate 0-1 variables to the elements of <P{ipo) and -T(V'o) as follows: yi = 1 if 
and only if an elementary cover of ipi is a cover of the automaton; Xj = 1 if and only if 
7i is a state of the automaton. We then search for min ^ Xi, subject to: 

Vo A A V A V (Xfc A < 7fc < V’i])) A A V yi(N(-ii))) , 

■ 016 # ■yk^r li&r 



where = j. The intuition behind this formulation is that constructing the automa- 
ton corresponds to finding DNF formulae for a set of boolean functions. (Each function 
is an elementary cover for some LTL formula.) It is well known that the solution consists 
of prime implicants of intersections of subsets of the functions in the set [15]. A function 
needs to be in the set if it is the expansion of the given LTL formula, or if it is the 
expansion of the next part of a term in the elementary cover chosen for a function that 
is in the set. This situation gives rise to closure constraints analogous to those found in 
the minimization of incompletely specified finite state machines [10]. Notice that we do 
not guarantee the optimum Biichi automaton for the given LTL formula (cf. Example 5 
in Section 6); rather, we guarantee that no closed set of elementary covers can be found 
that has fewer terms, and such that each elementary cover can be generated from the 
formula it covers by exclusive application of the tableau rules and the laws of boolean 
algebra. Note that prior research on translation algorithms has focused on this class of 
algorithms. 



Example 3. Continuing Example 2, let = F p A F -ip. We find: 



V’o = 


F p A F -ip 




70 


= p A X F ->p, 


7i 


= -'pAXFp, 72 = XFpAXF-ip 


V’l = 


Fp 




73 


= P, 


74 


= XFp 


V’2 = 


F ->p 




75 


= -'P, 


76 


= XF^p. 



Since -ipQ = f/ji A V'2, no further formulae and terms are generated by considering subsets 
of <P{'iPq). The minterms of the formulae in <P{'ipo) nre: 

= {pAXFpAXF -ip,p A -iX Fp A X F -ip, 

-■pAXFpAXF -ip, -ip A X F p A -■X F -ip} 

= {pAXFpAXF -ip,p A X Fp A -■X F -ip,p A -■X Fp A X F -ip, 
p A ->X F p A ->X F -ip, -ipAXFpAXF ->p, -ip A X F p A -■X F ->p} 

(V'2) = {~'P A X F p A X F -ip, —'p A X F p A “>X F -ip, —>p A —iX F p A X F -ip, 

-■p A ->X F p A ->X F -ip, pAXFpAXF ->p, p A ^X F p A X F -ip} . 
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The constraint is: 

3/0 A 

[“13/0 V (*0 V X2) A xo A {xi V X2) A xi] A 

[“’3/1 V (xoV *2 V *3 V Xi) A {x3 V X4,) A (xo V *3) A *3 A (xi V X2 V X4} A (xi V a;4)]A 
[“’3/2 V (*1 V *2 V *5 V xe) A (a:i V *5) A (xs V xe) A xs A (xo V X2 V xe) A (xo V aie)] A 
(“’*0 V 3/2) A (-•xi V 3/1) A (“’X2 V 3/0) A (“’®4 V 331) A (-•xe V 332) , 

which simplifies to j/oA 33 iA 3 / 2 AxoAxiAx 3 Aa; 5 . The feasible assignment that minimizes 
I]o<i <6 i®' 2/0 = 2/1 = J /2 = a/o = a/i = X 3 = X 5 = 1, X 2 = X 4 = xg = 0. This 

assignment corresponds to the solution found in Example 2. The fifth state of Figure 2 
is required because JV(p) = N{-'p) = T. 

Solving the 0-1 ILP exactly is expensive, and, as we just observed, only guarantees 
optimality within a class of algorithms; therefore, we adopt a heuristic approach. In 
choosing the elementary cover for a formula, we first obtain a prime and irredundant 
cover from the one produced by the approach of [5]. Then, we look for existing states 
that imply the new terms. If we find a state that exactly matches a new term, we do not 
need to create a new state for the latter. If we find an existing state that implies the new 
term, we try to reduce [3] the new term to the existing one. That is, we check whether 
the replacement of the new term by the existing one changes the function represented 
by the cover. For instance, in Example 2, the initial cover for F p, p V X F p, is reduced 
to p V (-’p A X Fp), because the second term already appears in the cover for p. 

If, on the other hand, we find a term that is implied by the new term, we check whether 
it can be reduced to the new term in all covers in which it already appears. We impose 
the constraint that the reduction only add atomic propositions or their negations. This 
constraint leaves the next part of the reduced term unchanged, and does not invalidate 
the covers obtained up to that point. 

5 Simplifying BUchi Automata Using Simulations 

The Biichi automata produced by our translation algorithm are not necessarily optimal. 
In this section we examine criteria that allow us to remove states from an automaton, 
while retaining its language. The techniques presented in this section can be used to 
simplify arbitrary Biichi automata. They do not always find the smallest possible Biichi 
automaton. Rather, they minimize the number of states and transitions heuristically. We 
deal with the simplification of the acceptance conditions in Section 6 . 

Our results derive from the notions of direct and reverse simulation. Direct simulation 
relations for Biichi automata have been studied in [ 6 ], and used in [1] for state-space 
minimization. Raimi [17] uses both direct and reverse simulations to minimize the state 
space, but does not take fairness into account. 

In Subsection 5.3, we contrast simulation with language containment. Intuitively, 
simulation takes care of correspondence of acceptance conditions, which is one of the 
reasons why it is stronger than language containment. We show that this means that the 
conditions under which we can use simulation to reduce the number of states are more 
general than the conditions under which we can use language containment. 
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It is clear that we can remove all states that are not reachable from at least one initial 
state. Such states are not produced by the translation procedure, but they do occur as a 
result of other optimizations. It is equally obvious that we can remove any state q with 
C{q) = 0 or S{q) = 0. 



Definition 3. A direct simulation relation over the states of a Biichi automaton A is any 
relation Q x Q that satisfies the following property: 



p < q implies 



f C{p) C C{q) , F&T=^[p&F^q&F] 
s G 5(p) ^ 3t £ 5{q) : s t . 



The largest direct simulation relation is denoted by If both p q and q p, 
then p and q are direct- simulation equivalent (p -D q)- 

A reverse simulation relation over the states of A is any relation FQ Q x Q that 
satisfies the following property: 



p F q implies 



f ^p) Q ^q) , p £ Qo ^ q & Qo , F£F^[p£F=^q£F] 
s G S~^{p) ^ 3t £ S~^{q) : s F t . 



The largest reverse simulation relation is denoted by If both p q and q p, 
then p and q are reverse-simulation equivalent (p cxfi q). 



The largest direct and reverse simulation relations can be found in polynomial time as 
the greatest fixpoints of the recursive definitions. Our definition of direct simulation 
corresponds to that of BSR-aa of [6]. If p <d q in A, then L{A^) C L{A'^) [16,6]. 



5.1 Direct Simulation 

A simulation relation between two states may allow us to remove a transition without 
disturbing the simulation relation. 

Theorem 1. Let Abe a Biichi automaton. For p^ q £ Q, p q, assume that p Fd q- 
Let M = {Q, Qom, 6m, D, £), where Qqm = Qo\ {p} ifq G Qo, and Qqm = Qo 
otherwise, and 5 m is defined as follows: 

X ifq£S{s), 

^ otherwise. 

Then for all s,t £ Q, s Fd t in A if and only if s Fd t in M. Also, any state in A is 
simulation-equivalent to the state of the same name in M.. 

Proof. (Sketch.) For the first statement, one can prove that the largest simulation relation 
on .4 is a simulation relation on Ai, using Definition 3 and transitivity of <o. In the 
same manner, one can prove that the largest simulation relation on AI is a simulation 
relation on A. This proves that s Ad f in .4 if and only if s Ad f in 44. The second 
statement can also be proved by direct application of Definition 3. □ 
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The following corollary holds because simulation implies language equivalence. 

Corollary 1. Let A and M. be as in Theorem 1. Then, L{A) = L{A4). 

Theorem 1 obviously works in both directions. Hence, we can also add an arc from 
r to p if q G S{r) and p <?■ Repeated application of this transformation gives us 
the following corollary, which implies that we can remove one of any two simulation- 
equivalent states. 

Corollary 2. Let A be a Biichi automaton. Let p,q & Q, p ^ q, and assume that 
p q. LetM = (Q,QomAm,J^,D,C), where Qom = {Qo \ M) U{g} ifp G Qo, 
and Qom = Qo otherwise, and 5 m coincides with 5, except that, for all s, ifp G (5(s), 
5m{s) = (<5(s) \ {p}) U {g}. Then L{A) = L{M), and for all s,t & Q, we have s t 
in A if and only if s <£> t in M.. 

5.2 Reverse Simulation 

In this subsection, we present techniques similar to the one in the last subsection, but 
pertaining to reverse similarity. 

Theorem 2. Let Abe a Biichi automaton. For p, q G Q, p q, assume that p gir q. 
Let A4 = {Q,Qo,Sm,IF, D,C), where 5m coincides with 5, except that 5m{p) = 
5{p) \ 5{q). Then for all s,t G Q, s f^R t in A if and only if s f^R t in M., and any state 
in A is reverse-simulation equivalent to the state of the same name in M.. 



Corollary 3. Let A and Ai be as in Theorem 2. Then, L{A) = L(Ai). 

Proof. (Sketch.) For any accepting run p of A, we can construct an accepting run p' 
of Ai. We can do this because for every i, we can choose a state p' in Ai such that 
Pi fiR p'i, p'i G Qo for i = 0, and p' G 5{p\_f) for z > 0. □ 

In analogy to direct simulation, we only need to retain one state in every reverse-similarity 
equivalence class. 

Corollary 4. Let A be a Biichi automaton. Let p, q G Q, p q, and assume that 
p q. Let Ai = (Q, Qo, 5m, A, D, £), where 5m{p) = 0, <5m(?) = 5{p) U 5{q), and 
5m{s) = <)(s) for s {p, q\. Then L{A) = L{Ai), and for all s, t GQ we have s f,R t 
in A if an only if s f,R t in Ai. 

5.3 Language Containment 

In this section we consider the more general case of language containment, and contrast 
it with simulation relations. The techniques in this section are not used in the algorithm, 
since language inclusion can not be checked easily, and because the minimization re- 
quires stricter conditions in the case of language containment. Indeed, we cannot simply 
substitute L{A^) C L{A'^) for p q in Theorem 1 as evidenced by the following 
example. 
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Fig. 3. Automaton showing that L(A^) = L{A‘‘) is not sufficient to allow simplification. 



Example 4. Consider the automaton of Figure 3. The languages L{A^) and L{A‘^) are 
the same (they correspond to the formula G F a). State s is a common predecessor of p 
and q. However, we cannot remove the arc from s to p without making the language of 
the automaton empty. The problem is that the accepting runs starting at q must use the 
arc that we want to remove in order to reach the accepting state. Notice that q does not 
direct simulate p. 

The next theorem is analogous to Corollary 1 for direct simulation. It allows us 
to remove an arc from a state r to a successor p if r has another successor q with 
L(A^) C L{A'^). We impose a condition sufficient to guarantee that the L(A‘^) is not 
changed by removal of the arc. 

Theorem 3. Let Abe a Biichi automaton. For p^ q G Q, p ^ q, assume that L{A^) C 
L{A'^). Let M = {Q,Qqm,5m,^,D,C), where Qqm = Qo \ {p} if q G Qo, and 
Qom = Qo otherwise, and 5 m coincides with 5, except that for all r G Q such that 
q G 5(r) and r is not reachable from q, <fM(r) = 5{r) \ {p}. Then L{A) = L{M). 

By repeated application of Theorem 3, we can also prove that we can merge two language- 
equivalent states p and q, as long as q cannot reach any predecessors of p, in analogy to 
Corollary 2. 

6 Pruning the Fair Sets 

The automata produced by the algorithm of Section 4 have as many accepting conditions 
as there are until subformulae in the LTL formula. In this section we show how to shrink 
some fair sets or drop them altogether. Simplifying the fair sets has several benefits. First 
of all, it may lead to a reduction in strength of the resulting automaton. (For instance, 
if the automaton is reduced to terminal, model checking requires a simple reachability 
analysis.) Even if the strength of the automaton is not reduced, fewer, smaller fair sets 
usually lead to faster convergence of the language emptiness check. Simplifying the 
acceptance conditions may also enable further reductions in the number of states or 
transitions. Finally, the resulting automaton is often easier to understand. 

The pruning of the fair sets is based on the analysis of the strongly connected com- 
ponents (SCCs) of the state graph. 

Definition 4. An SCC of a Biichi automaton A is fair if it is non-trivial and it intersects 
all fair sets. Let 0 a be the union of all fair SCCs of A; 0 a w called the final set of A. 
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Every accepting run of an automaton is eventually contained in a fair SCC. Hence, states 
not in 0A can be removed from the accepting sets. Furthermore, states with no paths 
to Oa can be removed altogether: their language is empty. Fair sets that contain Oa, or 
that include another fair set, can be dropped without changing the language accepted by 
the automaton. It is not uncommon for one fair sets to become included in another once 
a few states are dropped from it, as a consequence, for instance, of the application of the 
following results. 

When one fair set is contained in another we can remove states from the latter if they 
do not appear in the former. We can extend this result by focusing on a single SCC. 

Theorem 4. Let A = {Q, Qo, S, T ^ D, C) be a Biichi automaton and let 7 be an SCC 
of A. Suppose that there exist F,F' G T such that f n 7 C n 7. Let M = 
{Q, Qo, S, Fm, D, £), where 



Fm = {F\{F'})U{F'm}, with 
F'm = F'\{j\F). 



ThenL{A) = L{M). 

The next theorem, which shows that a state can be removed from a fair set if there is 
a suitable ‘detour’, will be followed by an example. 

Theorem 5. Let A = {Q, Qo, S,F, D, C) be a Biichi automaton. Let F be a fair set 
in T, and p, q distinct states in F such that C{p) C L{q), 5{p) C 6{q), and 6~^{p) C 
S~^{q). Let M = {Q, Qo, S,Fm, D, C), where Fm = {F \ {F}) D {F \ {p}}. Then 
L{A) = L{M). 

Example 5. Two automata for G ( F p A F q) are shown in Figure 4. Theorem 5 applies to 





Fig. 4. Biichi automata for G (Fp A F q) illustrating the application of Theorem 5. 



State Til of the automaton on the left. After its removal from the two fair sets. State rii 
can be merged with State 714 . The resulting automaton is shown on the right of Figure 4. 
It is worth pointing out that the automaton on the left is constructed from a minimum 
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cost elementary cover of G (Fp A F g). Since there is no elementary cover with fewer 
than four terms, there is no way to generate the three-state solution by direct application 
of the translation algorithm of Section 4. 

Definition 5. Let p Q be like p q, but without the condition on the fair sets. 

Theorem 6 . Let A = {Q, Qo, S, T, D, C) be a Biichi automaton. Let 7 C Q be an 
see, and q £ j a state. Suppose that for all p £ 7 we have p <r < 7 . <5(p) C 5{q), and 
q £ 5{p). Let M = {Q,Qo,Sm,IF,D,C) with 6 m{s) = S{s) \ (7 \ {q}) for s £ 7 , 
s q, and 5 m{s) = <5(s) otherwise. Then, L{Af = L{Ai). 

Proof. Clearly, L{A) 3 L{A4). Let p be an accepting run for a £ D‘^ in A. If p does 
not go through 7, it is also an accepting run for ct in AI. If p enters and exits 7, let (s, p) 
be the last arc of p before it leaves 7. If s ^ 7, p does not use any arc internal to 7. 
Hence, it is also an accepting run for cr in AI. If s G 7, s q. (This is not changed by 
the removal of arcs internal to 7.) Hence, there is a run pM in AI that reaches q when p 
reaches s, goes to p (also a successor of q) next, and then continues identical to p. 

If p dwells in 7 forever (inf(p) C 7), 7 is a fair SCC (it intersects all fair sets). 
For each F) £ T , choose Sj £ {Ft fl inf(p)). We can build a run pM such that every 
occurrence of Si is followed by an occurrence of S(i+i)mod|.7='|- This can be done by 
going from Si to q, waiting in q until p goes to S(i+i)„iod|.7='P ^nd then going to that state. 
We may “skip a beat” if S(i+i)mod|:F| follows immediately Si in p, but we shall just take 
the next occurrence of S(i+i)mod|.7='|- Since p Fr q for all p G 7, £{p) C £((?); hence, 
Pm accepts a. □ 

Thanks to Theorem 6 we can remove the arcs out of States ri2 and in the right 
automaton of Figure 4, except those going to State 714. It is also possible to remove 
and ri3 from the set of initial states, though this is not covered by the theorem. 

The next result is based on the observation that, in determining what states of an 
SCC should belong to a fair set, we can ignore the arcs out of the SCC. 

Theorem 7. Let A = {Q, Qo, 6, T , D, C) be a Biichi automaton. Letj be an See of A, 
and p a state in 7 such that, when the arcs out of^ are removed, q £ j implies q P 
and q Fr p. Let M. = (Q, Qo, S, Tm, D, £), where Tm = {F \ {p}\F £ F}. Then 
L{A) = L{M). 

If every cycle through a state p visits a fair set in some other state, we do not need 
to include p in that fair set. 

Theorem 8 . Let A = {Q, Qo, S, T , D, C) be a Biichi automaton. Let F £ F, and p a 
state in F such that every cycle through p intersects F in a state different from p. Let 
M = {Q,Qo,S,Fm,D,C}, where Fm = (A\ F) U {F\ {p}}. Then L{A) = L{M). 

After application of the results presented in this section, a Biichi automaton may still have 
multiple fair sets. It is well-known that we can always reduce the number of acceptance 
conditions to one by introducing a counter. This is not usually done because the algorithm 
of Emerson and Lei [7] deals with multiple acceptance conditions. However, for weak 
and terminal automata, we do not need to resort to the counter to reduce the number of 
fair sets to one. 
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Definition 6. A Biichi automaton is weak if and only if for each SCC 7, either for each 
fair set F, y Q F, or there exists a fair set F such that 7 fl -F = 0. A Biichi automaton 
is terminal if and only if it is weak, there is no arc from a fair SCC to a non-fair SCC, 
and for every state s in a fair SCC, lj{£(t) : t € <5(s)} = -D. 



Theorem 9. Let A = {Q, Qo, S, T , D, C) be a weak (generalized) Biichi automaton. 
Let M = (Q, Qo, {Oa}, D, C). Then L(A) = L(M). 

States that are not in 0a can be added to fair sets as long as 0a is not changed by the 
addition. In particular, states in trivial SCCs are “don’t care” states when it comes to 
checking the conditions on T in Definition 3. 

The automaton simplification procedure prunes the fair sets a first time before ap- 
plying simulation-based simplification, because smaller fair sets tend to produce larger 
simulation relations. The simulation-based techniques may break up SCCs because they 
remove arcs. Hence, pruning is performed a second time after they are applied. This can 
be iterated until a fixpoint is reached, but the benefits are minor and the extra CPU time 
comparatively large. 



7 Experimental Results 

In this section we report the results obtained with a translator from LTL to Biichi automata 
named Wring and based on the results discussed thus far. Wring is written in Perl. Based 
on our experience, we estimate that speed could be increased by at least an order of 
magnitude by coding the algorithms in C. However, CPU times are modest, and even a 
relatively slow implementation is more than adequate. 

Table 1 shows a comparison to the algorithms analyzed in [5] on common formu- 
lae and formulae found in the literature. It should be noted that the outcomes of the 
translation algorithms depend on the order in which sub-formulae are examined. Hence, 
different implementations may produce different results. Rewriting of the formula (see 
Section 3) is most effective in detecting tautologies and contradictions. The transforma- 
tions discussed in Section 6 are quite effective in reducing the number of fair sets. 

Table 2 shows results for randomly generated formulae [5], showing the effects of 
each of the three major extensions that we propose. It should be noted that working on 
simpler formulae and smaller automata offsets most of the additional cost of minimiza- 
tion. One can see from Table 2 that the reduction in transitions and fair sets corresponds 
to an increase in the number of terminal automata. Though we do not present a detailed 
analysis of the dependence of the results on the statistics of the formulae (number of 
nodes, number of atomic propositions, and percentage of temporal operators), we have 
observed the same trends reported in [5]. 



8 Conclusions and Future Work 

We have presented a heuristic algorithm for the generation of small Biichi automata from 
LTL formulae. It works in three stages: rewriting of the formula, boolean optimization 
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Table 1. Comparison to our implementation of the algorithms analyzed in [5]. For each formula 
and each method, the numbers of states, transitions, and accepting conditions are given. The letters 
in the accepting sets columns indicate whether the automata are strong (s), weak (w), or terminal 
(t). 
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Table 2. Results for 1000 random formulae with parse graphs of 15 nodes, using 3 atomic pro- 
positions, and uniform distribution of the operators (V, A, X, U , R ) for the internal nodes. 
In the designation of the method, ‘r’ stands for rewriting, ‘b’ stands for boolean optimization, 
and ‘s’ stands for automaton simplification. Method Wring is equivalent to LTL2AUT-l-rbs. The 
experiments were run on an IBM Intellistation with a 400 MHz Pentium II CPU. 
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to reduce the number of states in the translation procedure, and simplification of the 
automaton. Rewriting is a cheap, simple, and effective technique to shrink the formula 
and reduce redundancy. The boolean optimization technique that we have presented 
yields the smallest translation of any procedure in its class. We do not know of any 
translation procedures that do not fall within this class, its most important characteristic 
being the use of the given expansion functions. Since the optimal algorithm is expensive, 
we have implemented a heuristic approximation. Finally, the simplihcation step uses 
direct and reverse simulation to minimize the number of states and transitions in the 
formula, and it prunes the acceptance conditions to make the emptiness check easier. 

We have shown that our algorithm outperforms previously published algorithms on 
both random formulae, and formulae in common use and from the literature. 

We are investigating weaker relations that allow for state minimization, and can 
still be computed efficiently. In particular, we are working on combinations of direct 
and reverse simulation, and simulation with less strict conditions on the acceptance 
conditions such as BSR-lc of [6]. Another area of investigation is the use of semantic 
information about the literals in the simplification of elementary covers. For symbolic 
model checking, the automata must be given binary encodings. A careful choice of the 
state encodings and the use of don’t care conditions derived from simulation relations and 
from the analysis of the SCCs of the graph should help reduce the cost of the language 
emptiness check. 
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Abstract. A new approach is presented for detecting whether a particu- 
lar computation of an asynchronous distributed system satisfies Poss^ 
(read “possibly ^”), meaning the system could have passed through 
a global state satisfying predicate or Def ^ (read “definitely ^”), 
meaning the system definitely passed through a global state satisfying 
Detection can be done easily by straightforward state-space search; 
this is essentially what Cooper and Marzullo proposed. We show that 
the persistent-set technique, a well-known partial-order method for op- 
timizing state-space search, provides efficient detection. This approach 
achieves the same worst-case asymptotic time complexity as two special- 
purpose detection algorithms of Garg and Waldecker that detect Poss < 1 > 
and Def $ for a restricted but important class of predicates. For Poss^, 
our approach applies to arbitrary predicates and thus is more gene- 
ral than Garg and Waldecker’s algorithm. We apply our algorithm for 
Poss ^ to two examples, achieving a speedup of over 700 in one example 
and over 70 in the other, compared to unoptimized state-space search. 



1 Introduction 

Detecting global properties {i.e., predicates on global states) in distributed sy- 
stems is useful for monitoring and debugging. For example, when testing a dis- 
tributed mutual exclusion algorithm, it is useful to monitor the system to detect 
concurrent accesses to the critical sections. A system that performs leader el- 
ection may be monitored to ensure that processes agree on the current leader. 
A system that dynamically partitions and re-partitions a large dataset among a 
set of processors may be monitored to ensure that each portion of the dataset is 
assigned to exactly one processor. 

An asynchronous distributed system is characterized by lack of synchronized 
clocks and lack of bounds on processor speed and network latency. In such a 
system, no process can determine in general the order in which events on different 
processors actually occurred. Therefore, no process can determine in general the 
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sequence of global states through which the system passed. This leads to an 
obvious difficulty for detecting whether a global property held. 

Cooper and Marzullo’s solution to this difficulty involves two modalities, 
which we denote by Poss (read “possibly”) and Def (read “definitely”) [CM91]. 
These modalities are based on logical time as embodied in the happened-before 
relation, a partial order that reflects causal dependencies [Lam78]. A history of an 
asynchronous distributed system can be approximated by a computation, which 
comprises the local computation of each process together with the happened- 
before relation. Happened-before is useful for detection algorithms because, using 
vector clocks [Fid88,Mat89,SW89], it can be determined by processes in the 
system. 

Happened-before is not a total order, so it does not uniquely determine the 
history. But it does restrict the possibilities. Histories consistent with a compu- 
tation c are those sequences of the events in c that correspond to total orders 
containing the happened-before relation. A consistent global state (CGS) of a 
computation c is a global state that appears in some history consistent with c. A 
computation c satisfies Poss <P iff, in some history consistent with c, the system 
passes through a global state satisfying <?. A computation c satisfies Def (p iff, in 
all histories consistent with c, the system passes through a global state satisfying 
<P. 

Cooper and Marzullo give centralized algorithms for detecting Poss^ and 
Def <P [CM91]. A stub at each process reports the local states of that process 
to a central monitor. The monitor incrementally constructs a lattice whose ele- 
ments correspond to CGSs of the computation. Poss ‘P and Def <P are evaluated 
by straightforward traversals of the lattice. In a system of N processes, the 
worst-case number of CGSs, which can occur in computations containing little 
communication, is 0{S^), where S is the maximum number of steps taken by 
a single process. Any detection algorithm that enumerates all CGSs — like the 
algorithms in [CM91,MN91,JMN95,AV94] — has time complexity that is at least 
linear in the number of CGSs. This time complexity can be prohibitive. This mo- 
tivated the development of efficient algorithms for detecting restricted classes of 
predicates [TG93,GW94,GW96,CG98]. The algorithms of Garg and Waldecker 
are classic examples of this approach. A predicate is n-local if it depends on the 
local states of at most n processes. In [GW94] and [GW96], Garg and Waldecker 
give efficient algorithms that detect Poss <P and Def <P, respectively, for predi- 
cates <P that are conjunctions of 1-local predicates. Those two algorithms are 
presented as two independent works, with little relationship to each other or to 
existing techniques. 

This paper shows that efficient detection of global predicates can be done 
using a well-known partial-order method. Partial-order methods are optimized 
state-space search algorithms that try to avoid exploring multiple interleavings 
of independent transitions [PPH97]. This approach achieves the same worst-case 
asymptotic time complexity as the two aforementioned algorithms of Garg and 
Waldecker, assuming weak vector clocks [MN91], which are updated only by 
events that can change the truth value of <P and by receive events by which 




266 



S.D. Stoller, L. Unnikrishnan, and Y.A. Liu 



a process first learns of some event that can change the truth value of are 
used with our algorithm. Specifically, we show that persistent-set selective se- 
arch [God96] can be used to detect Poss ‘P or Def for conjunctions of 1-local 
predicates with time complexity 0{N‘^S). In some non- worst cases, Garg and 
Waldecker’s algorithms may be faster than ours by up to a factor of N, because 
their algorithms also incorporate an idea, not captured by partial-order methods, 
by which the algorithm ignores local states of a process that do not satisfy the 
1-local predicate associated with that process. For details, see [SUL99]. 

Our method for detecting Def handles only conjunctions of 1-local predi- 
cates. Our method for detecting Poss <P handles arbitrary predicates and thus is 
more general than Garg and Waldecker’s algorithm. Furthermore, our method 
is asymptotically faster than Gooper and Marzullo’s algorithm for some clas- 
ses of systems to which Garg and Waldecker’s algorithm does not apply. For 
some other classes of systems, although our method has the same asymptotic 
worst-case complexity as Gooper and Marzullo’s algorithm, we expect our me- 
thod to be significantly faster in practice; this is typical of general experience 
with partial-order methods. Our algorithm for detecting Poss <P can be further 
optimized to sometimes explore sequences of transitions in a single step. This 
can provide significant speedup, even reducing the asymptotic time and space 
complexities for certain classes of systems. 

We give simple specialized algorithms PSposs and PSoef for computing per- 
sistent sets for detection of Poss and Def respectively. These algorithms 
exploit the structure of the problem in order to efficiently compute small per- 
sistent sets. One could instead use a general-purpose algorithm for computing 
persistent sets, such as the conditional stubborn set algorithm (GSSA) [God96, 
Section 4.7], which is based on Valmari’s work on stubborn sets [Val97j. When 
GSSA is used for detecting Poss^, it is either ineffective (z.e., it returns the set 
of all enabled transitions) or slower than PSposs by a factor of S (and possibly 
by some factors of N) in the worst case, depending on how it is applied. The 
cheaper algorithms for computing persistent sets in [God96] are ineffective for 
detecting Poss^. When GSSA (or any of the other algorithms in [God96]) is 
used for detecting Def it is ineffective. Detailed justifications of these claims 
appear in [SUL99]. 

For simplicity, we present algorithms for off-line property detection, in which 
the detection algorithm is run after the distributed computation has terminated. 
Our approach can also be applied to on-line property detection, in which a 
monitor runs concurrently with the system being monitored. 

Property detection is a special case of model-checking of temporal logics 
interpreted over partially-ordered sets of global configurations, as described in 
[AMP98,Wal98j. Those papers do not discuss in detail the use of partial-order 
methods to avoid exploring all global states and do not characterize classes of 
global predicates for which partial-order methods reduce the worst-case asym- 
ptotic time complexity. Alur et al. give a decision procedure for the logic ISTL^ . 
Poss^ is expressible in ISTL^ as Def is expressible in ISTL as 
but appears not to be directly expressible in ISTL^. 
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An avenue for future work is to try to extend this approach to efficient 
analysis of message sequence charts [MPS 98 ,AY 99 ]. 

2 Background on Property Detection 

A local state of a process is a mapping from identifiers to values. Thus, s(v) 
denotes the value of variable v in local state s. A history of a single process is 
represented as a sequence of that process’s states. Let [m..n] denote the set of 
integers from m to n, inclusive. We use integers [l-.fV] as process names. 

In the distributed computing literature, the most common representation of 
a computation c of an asynchronous distributed system is a collection of histories 
c[l], . . . ,c[fV], one for each constituent process, together with a happened-before 
relation — >• on local states [GW 94 ]. For a sequence h, let h[k] denote the 
element of h {i.e., we use 1 -based indexing), and let \h\ denote the length of h. 
Intuitively, a local state si happened-before a local state S2 if si finished before 
S2 started. Formally, — >■ is the smallest transitive relation on the local states in 
c such that 

1 . Vt G [l..A^], /c G [l..|c[i]| — 1 ] : c[i][A:] — >■ c[z][/c -b 1 ]). 

2 . For all local states Si and S2 in c, if the event immediately following si is the 
sending of a message and the event immediately preceding S2 is the reception 
of that message, then si — >• S2- 

We always use S to denote the maximum number of local states per process, 
i.e., max(|c[l]|,...,|c[A^]|). 

Each process has a distinguished variable vt such that for each local state s, 
s{vt) is a vector timestamp [Mat 89 ], i.e., an array of N natural numbers such 
that s(ut)[z] is the number of local states of process i that happened-before 
s. Vector timestamps capture the happened-before relation. Specifically, for all 
local states si and S2, si — >■ S2 iff (Vz G : si(z;t)[z] < S2(z;t)[z]). Two local 

states Si and S2 of a computation are concurrent, denoted si || S2, iff neither 
happened-before the other: si || S2 = s\ S2 A S2 s\. 

A global state s of a computation c is an array of N local states such that, 
for each process i, s[z] appears in c[z]. A global state is consistent iff its constitu- 
ent local states are pairwise concurrent. Intuitively, consistency means that the 
system could have passed through that global state during the computation. 

Concurrency of two local states can be tested in constant time using vector 
timestamps by exploiting the following theorem [FR 94 ]: for a local state si of 
process Zi and a local state S2 of process Z2, Si || S2 iff S2{vt)[i2] > Si(z't)[z2] A 
Si(z;t)[zi] > S2(z;t)[zi]. Thus, a global state s is consistent iff (Vz,j G [l-.A^] : 
s[z](z;t)[z] > s[j]{vt)[i\). 

A computation c satisfies Poss 4>, denoted c \= Poss <P, iff there exists a CCS 
of c that satisfies 

Introduce a partial order on global states: si Yg S2 = (Vz G [l..A^] : 
Si[z] = S2[z] V Si[z] — >■ S2[*])- A history consistent with a computation c is a finite 
or infinite sequence cr of consistent global states of c such that, with respect to 
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Fig. 1. Computation cq. 



:<C- (i) cr[l] is minimal; (ii) for all k € [l--|cr| — 1], a[k + 1] is an immediate 
successor^ of cr[fc]; and (Hi) if a is finite, then cr[|cr|] is maximal. 

A computation c satisfies Def <P, denoted c \= Def (p, iff every history consi- 
stent with c contains a global state satisfying <P. 

Example. Consider the computation cq shown in Figure 1. Horizontal lines cor- 
respond to processes; diagonal lines, to messages. Dots represent events. Each 
process has a variable vt containing the vector time. Variable pi contains the 
rest of process i’s local state. Let Ski,k 2 denote the global state comprising 
the fci’th local state of process 1 and the /c 2 ’th local state of process 2. The 
CGSs of Cq are S 2 ,i) S 2 , 2 j S 2 ,s, 53 , 3 , S 2 , 4 ; •S 3 , 4 }- Some properties of this com- 
putation are: cq |= Poss (pi = Y A p 2 = D), cq 1= Def (pi = Y A p 2 = B), 
Co ^ Def (pi = Y Ap 2 = D), and cq ^ Poss (pi = X Ap 2 = B). 

3 Background on Partial-Order Methods 

The material in this section is paraphrased from [God96]. Beware! The system 
model in this section differs from the model of distributed computations in the 
previous section. For example, “state” has different meanings in the two models. 
Sections 4 and 5 give mappings from the former model to the latter. 

A concurrent system is a collection of finite-state automata that interact via 
shared variables (more generally, shared objects). More formally, a concurrent 
system is a tuple {V, O, T, Sinit), where 

— P is a set {Pi, . . . , Pn} of processes. A process is a finite set of control points. 

— O is a set of shared variables. 

— T is a set of transitions. A transition is a tuple (Li, G, G, P 2 ), where: L\ is 
a set of control points, at most one from each process; P 2 is a set of control 
points of the same processes as Li, and with at most one control point from 
each process; G is a guard, z.e., a boolean- valued expression over the shared 
variables; and G is a command, i.e., a sequence of operations that update 
the shared variables. 

“ Sinit is the initial state of the system. 

^ For a reflexive or irreflexive partial order (S, -<) and elements x ^ S and y ^ S, y is 
an immediate successor oi x iS x ^ y A x < y A -'(3 z € S \ {x, y} \ x < z Az < y). 
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A (global) state is a tuple (L, V), where L is a collection of control points, one 
from each process, and is a collection of values, one for each shared variable. 
For a state s and a shared variable v, we abuse notation and write s(v) to denote 
the value of in s (the same notation is used in Section 2 but with a different 
definition of “state”). Similarly, for a state s and a predicate </>, we write s(^) to 
denote the value of </> in s. A transition (Ai, G, G, L2) is enabled in state (L, V) if 
Li C L and G evaluates to true using the values in V. Let enabled(s) denote the 
set of transitions enabled in s. If a transition (Ai,G, G, L2) is enabled in state 
s = {L, V), then it can be executed in s, leading to the state ((L\Li)UA2, C{V)), 
where G(t^) represents the new values obtained by using the operations in G to 
update the values in V. We write s A- s' to indicate that transition t is enabled 
in state s and executing t in s leads to state s'. 

An execution of a concurrent system is a finite or infinite sequence si % 
S2 A S3 • • • such that si = Sinit and for all i, Si A Sj+i. A state is reachable (in 
a system) if it appears in some execution (of that system) . 

Suppose we wish to find all the “deadlocks” of a system. Following Godefroid 
(but deviating from standard usage), a deadlock is a state in which no transitions 
are enabled. Clearly, all reachable deadlocks can be identified by exploring all 
reachable states. This involves explicitly considering all possible execution orde- 
rings of transitions, even if some transitions are “independent” (z.e., executing 
them in any order leads to the same state; formal definition appears in Appen- 
dix) . Exploring one interleaving of independent transitions is sufficient for finding 
deadlocks. This does cause fewer intermediate states (z. e., states in which some 
but not all of the independent transitions have been executed) to be explored, 
but it does not affect reachability of deadlocks, because the intermediate states 
cannot be deadlocks, because some of the independent transitions are enabled in 
those states. Partial-order methods attempt to eliminate exploration of multiple 
interleavings of independent transitions, thereby saving time and space. 

A set T of transitions enabled in a state s is persistent in s if, for every 
sequence of transitions starting from s and not containing any transitions in 
T, all transitions in that sequence are independent with all transitions in T. 
Formally, a set T C enabled(s) is persistent in s iff, for all nonempty sequences of 

transitions Si A S2 A S3 • • • s„ A s„+i, if Si = s and (Vz € [l..n] : ti ^ T), 
then tn is independent in s„ with all transitions in T. As shown in [God 96 ], in 
order to find all reachable deadlocks, it suffices to explore from each state s a 
set of transitions that is persistent in s. State-space search algorithms that do 
this are called persistent- set selective search (PSSS). Note that enabled(s) is 
trivially persistent in s. To save time and space, small persistent sets should be 
used. 



4 Detecting Poss^ 

Given a computation c and a predicate we construct a concurrent system 
whose executions correspond to histories consistent with c, express c |= Poss 
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as a question about reachable deadlocks of that system, and use PSSS to answer 
that question. The system has one transition for each pair of consecutive local 
states in c, plus a transition to whose guard is to disables all transitions, so 
it always leads to a deadlock. 

Each process has a distinct control point corresponding to each of its local 
states. The control point corresponding to the fc’th local state of process i is 
denoted ^i^k■ Thus, for i G [l..iV], process t is Pi = Ufc=i |c[i]|{^Lfc}- We introduce 
a new process, called process 0, that monitors Process 0 has a single transition 
to, which changes the control point of process 0 from £o,nd (mnemonic for “not 
detected”) to ^o,d (“detected”). Thus, process 0 is Pq = {£o,nd,£o,d}- The set of 
processes is P = lJi=o atI-P*}- Initially, process 0 is at control point ^o.nd, and 
for t > 0, process i is at control point £ip. 

The local state of process i is stored in a shared variable pi. The initial value 
of Pi is c[t][l]. For convenience, the index of the current local state of process i is 
stored in a shared variable Ti. The initial value of Ti is 1, and Ti is incremented 
by each transition of process i. Whenever process i is at control point £i^k, Ti 
equals k. The set of shared variables is O = lJi=i ArlPo'^i}- 

Transition ti^k takes process i from its fc’th local state to its (fc + l)’th local 
state, ti^k is enabled when process i is at control point 4.fc, process 0 is at control 
point t'o.nd, Ti equals k, and the {k + l)’th local state of process i is concurrent 
with the current local states of the other processes. The set of transitions is T = 
{fo} UUi=i„Ar,fc=i..|c[i]|-i{^*.fc}’ "^here to = {{lo,nd}, ^Pi,...,pn), skip, {lo,d}) 
and 



k — k^ ^0 TT-ti} 5 

T, = k A (Vj G [1..7V] \ {i} : c[j][Tj]{vt)[j] > c\i][k + l](ut)[j]). 

Pi := c[i][k + 1]; n := k + 1, 

\^£i,k+l , 1^0, nd} ) 

The guard can be simplified by noting that c[t][Ti](ut)[t] always equals Ti. It 
is easy to show that £o,d is reachable iff a state satisfying <P is reachable, and 
that all states containing ^o,d are deadlocks. Thus, c |= Poss<P iff a deadlock 
containing is reachable. 

Example. The transitions of the concurrent system corresponding to Cq of Figure 
1 are to and 
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An alternative is to construct a transition system similar to the one above 
but in which both occurrences of £o,nd in ti,k are deleted. As before, c \= Poss 
iff £04 is reachable (or, equivalently, to is reachable), but now, states containing 
£o,d are not necessarily deadlocks. PSSS can be used to determine reachability 
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of control points (or transitions), provided the dependency relation is weakly 
uniform [God96, Section 6.3]. Showing that the dependency relation is weakly 
uniform requires more effort than including in U,k and has no benefit: we 
obtain essentially the same detection algorithm either way. 

We give a simple algorithm PSposs that efficiently computes a small persi- 
stent set in a state s by exploiting the structure of Without loss of generality, 
we write <P as a, conjunction: <P = „ (j)i, with n > 1. The support of a for- 

mula (j), denoted supp{(j)), is the set of processes on whose local states </> depends. 
Suppose (P is true in s. Then PSposs(s) returns enabled{s). When such a state 
is reached, there is no need to try to find a small persistent set, because we can 
immediately halt the search and return “c |= Poss . We return “c ^ Poss 
if the selective search terminates without encountering a state satisfying (p; by 
construction, this is equivalent to unreachability of deadlocks containing ^o,d- 
Suppose is false in s. The handling of this case is based on the following 
theorem, a proof of which appears in the Appendix. 

Theorem 1. Suppose <P is false in s. Let T he a subset of enabled(s) such that, 
for all sequences of transitions starting from s and staying outside T, <P remains 
false; more precisely, for all sequences of transitions si % S 2 ^ ■ Sn ^ Sn-i-i, 

if Si = s and (Vi € [l..n] : U ^ T), then L> is false in s„+i. Then T is persistent 
in s. 

To construct such a set T, choose some conjunct (f> of that is false in s. 
Clearly, <L> cannot become true until </> does, and (j) cannot become true until 
the next transition of some process in supp{4>) is executed. Note that the next 
transition of process i must be Thus, for each process i in suppff), if 

process i is not in its final state (i.e., s(ri) < |c[i]|), and if the next transition 
ti,s(Ti) of process i is enabled, then add to T, otherwise find some enabled 

transition t that must execute before and add t to T. To find such a 

t, we introduce the wait-for graph WF{s), which has nodes [1..N], and has an 
edge from i to j if the next transition of process j must execute before the next 
transition of process i, i.e., if s{Tj) < c[i][s(ri) -I- l](ut)[j] (we call this a wait-for 
graph because of its similarity to the wait-for graphs used for deadlock detection 
[SG98, Section 7.6.1]). Such a transition t can be found by starting at node i 
in WF(s), following any path until a node j with no outedges is reached, and 
taking t to be tygy..y The case in which ti^gy.y is enabled is a special case of 
this construction, corresponding to a path of length zero, which implies i = j. 
Pseudo-code appears in Figure 2. 

The wait-for graph can be incrementally maintained in 0{N) time per tran- 
sition. Let d = TLnax{\supp{4>i )\, . . . , \supp{(pn)\)- PSposs(s) returns a set of size 
at most d if is false in s. Gomputing PSposs(s) takes 0{Nd) time, because 
the algorithm follows at most d paths of length at most N in the wait-for graph. 
Thus, the overall time complexity of the search is 0{NdNe), where Ne is the 
number of states explored by the algorithm. 

Suppose is a conjunction of 1-local predicates. Then PSposs returns sets 
of size at most 1, except when is true in s, in which case the search is halted 
immediately, as described above. Thus, at most one transition is explored from 
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if holds in s) V enabled(s) — 0 then return enabled{s) 
else choose some conjunct 0 of ^ such that (j> is false in s; 

T:=0; 

for all i in supp{4>) such that s(Ti) < |c[i]| 
start at node i in WF{s)\ 

follow any path until a node j with no outedges is reached; 
insert tj,s(T ) in T] 

return T 



Fig. 2. Algorithm PSposs(s)- 



each state. Also, the system has a unique initial state. Thus, the algorithm 
explores one linear sequence of transitions. Each transition in 'T appears at most 
once in that sequence, because tg disables all transitions, and each transition ti^k 
permanently disables itself. |T| is 0{NS), so Ng is also 0{NS), so the overall 
time complexity is 0{N‘^S). 

Example. In evaluation of Cg |= Poss (pi = X A P 2 = B), ti i, t 2 p, and f 2,2 sxe 
executed. In the resulting state S 2 ,s, PSposs(s 2 , 3 ) may return {^ 1 , 2 } or {^ 2 , 3 }, 
causing S 24 or 53 ^ 3 , respectively, to be never visited. 

Example. Consider evaluation of c |= Poss^i(pi) A 4>2{P2,P3), for predicates 
(j)i and 4>2 such that (j)i is true in most states of process 1 , and (j )2 is true in 
at most 0{S) consistent states of processes 2 and 3. PSSS does not explore 
transitions of process 1 in states where ^2 is false and (pi is true, so its worst- 
case running time for such systems is Garg and Waldecker’s algorithm 

[GW94] is inapplicable, because </>2 is not 1-local. The worst-case running time 
of Cooper and Marzullo’s algorithm [CM91] for such systems is 0{S^). 

Exploring Sequences of Transitions. The following optimization can be used to 
reduce the number of explored states and transitions. In the else branch of 
PSposs, if the next transition of process i is not enabled, and if process i is not 
waiting for any other process in supp{<j)), then insert in T a minimum- length 
sequence w of transitions that ends with a transition of process i. For details 
and an example, see [SUL99]. 

On-line Detection. For simplicity, the above presentation considers off-line pro- 
perty detection. Our approach can also be applied to on-line property detection. 
Local states arrive at the monitor one at a time. For each process, the local 
states of that process arrive in the order they occurred. However, there is no 
constraint on the relative arrival order of local states of different processes. For 
on-line detection of Poss detection must be announced as soon as local states 
comprising a GGS satisfying <P have arrived. This is easily achieved by modifying 
the selective search algorithm to explore transitions as they become available: 
there is no need for the selective search to proceed in depth-first order, so the 
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stack can be replaced with a “to-do set”, and in each iteration, any element of 
that set can be selected. This does not affect the time or space complexity of 
the algorithm. 



5 Detecting Def ^ 

As in Section 4, we construct a concurrent system whose executions correspond 
to histories consistent with c, express c \= Def <P as a question about reachable 
deadlocks of that system, and use PSSS to answer that question. The construc- 
tion in Section 4 for Poss is similar to well-known constructions for reducing 
safety properties to deadlock detection [God96], but our construction for Def 
seems novel. The transitions are similar to those in (1), except the guard of each 
transition is augmented to check whether the transition would truthify if so, 
the transition is disabled. If Sinit{^) holds, then c \= T>ei<P, and no search is 
performed. Thus, in a search, <P is false in all reachable states. If the final state 
(z.e., the state satisfying /\i=i = |c[*]|) is reachable, then each sequence of 

transitions from Sinit to the final state corresponds to a history consistent with 
c and in which <P never holds, so c ^ Def <!>. 

The processes are the same as in Section 4, except with process 0 omitted. 
Thus, V = U=i..jv {Pi}, where Pi = Ufc=i |c[i]|{^bfc}- The shared variables are 
the same as in Section 4. Thus, O = lJi=i The initial state is the 

same as in Section 4, except with the control point for process 0 omitted. The 
transitions are T = Ui=i..Ar,fc=i..|c[i]|-i{^*,fc}> where 

U,k = {{^i,k}, (3) 

T, = k A (Vj G [I..N] \ {i} : c[j][Tj]{vt)[j] > c\i][k + l](vt)[j]) 

A -■^>(pi, . . . ,_p*_i, c[z][t* -b . . . ,p[iV]), 

Pj := c[i][k+ 1]; n := k + 1, 

{^i,k+i}) 

It is easy to show that c ^ Def <P iS <P is false in Smit and the final state is a 
reachable deadlock. 



Example. The transitions of the concurrent system corresponding to computa- 
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Our algorithm PScef for computing persistent sets of such concurrent sy- 
stems applies when is a conjunction of 1-local predicates: = Ai=i n 

where supp{4>i) = {z}. Pseudo-code for PSoef appears in Figure 3, where 



stayfalse{i,k) = ->c[i][k]{4>i) A-ic[z][fc-|- 



(5) 
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if enabled{s) — 0 then return enabled(s) 
else choose some i G [1---Y] such that stayfalse(i, s{Ti))\ 
start at node i in 4LF(s); 

follow any path until a node j with no outedges is reached; 

return 

Fig. 3. Algorithm PSDef(s)- 



Theorem 2. PSDef(s) is well-defined and is persistent in s. 

Proof: To show that PSDef(s) is well-defined, we show that the following for- 
mulas hold in the else branch: 

(3i G : stayfalse{i, s{Ti))) (6) 

tj^s(Tj) G enabled(s). (7) 

Proofs that these formulas hold and that PSDef(s) is persistent in s appear in 
the Appendix. □ 

The worst-case time complexity of PSoef is the same as PSposs for conjun- 
ctions of 1-local predicates, namely 0{N). Thus, the overall time complexity of 
the search is 0 {NNe), where Ng is the number of states explored by the algo- 
rithm. By the same reasoning as for PSposs, Ne is 0{NS). Thus, the overall 
time complexity is 0{N‘^S). 

Example. In evaluation of cq |= Def {pi = Y A p 2 = D), ti^i, t 2 .i, and ^2,2 are 
executed. In the resulting state 52,3, ^2,3 is disabled by its guard (executing ^2,3 
would truthify <P), so PSDef(s2,3) returns {^1,2}- 

On-line Detection. For on-line detection of Def d>, detection must be announced 
as soon as all histories consistent with the known prefix of the computation con- 
tain a CGS satisfying <P. As for on-line detection of Poss this is easily achieved 
by modifying the selective search algorithm to explore transitions as they be- 
come available, i.e., by replacing the stack with a “to-do set”. The algorithm 
announces that Def holds whenever the “to-do set” becomes empty. 

6 Examples 

We implemented our algorithm for detecting Poss^ in Java and applied it to 
two examples. 

In the first example, called database partitioning, a database is partitioned 
among processes 2 through N, while process 1 assigns task to these processes 
based on the current partition. Each process i G [1- W] has a variable partn^ con- 
taining the current partition. A process i G [2..A^] can suggest a new partition 
at any time by setting variable chg^ to true and broadcasting a message contai- 
ning the proposed partition and an appropriate version number. A recipient of 
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this message accepts the proposed partition if its own version of the partition 
has a smaller version number or if its own version of the partition has the same 
version number and was proposed by a process i' with i' > i. An invariant Idt 
that should be maintained is: if no process is changing the partition, then all 
processes agree on the partition. 

Idb = { /\ -^chg^) ^ /y partrii = partrij ( 8 ) 

The second example, called primary-secondary, concerns an algorithm desi- 
gned to ensure that the system always contains a pair of processes that will act 
together as primary and secondary {e.g., for servicing requests). This is expressed 
by the invariant 

Ipr — V isPrimary^ A isSecondary j A secondary ^ =j A primary j = i. (9) 

Initially, process 1 is the primary and process 2 is the secondary. At any time, 
the primary may choose a new primary as its successor by first informing the 
secondary of its intention, waiting for an acknowledgment, and then multicasting 
to the other processes a request for volunteers to be the new primary. It chooses 
the first volunteer whose reply it receives and sends a message to that process 
stating that it is the new primary. The new primary sends a message to the 
current secondary which updates its state to reflect the change and then sends 
a message to the old primary stating that it can stop being the primary. The 
secondary can choose a new secondary using a similar protocol. The secondary 
must wait for an acknowledgment from the primary before multicasting the 
request for volunteers; however, if the secondary receives instead a message that 
the primary is searching for a successor, the secondary aborts its current attempt 
to find a successor, waits until it receives a message from the new primary, and 
then re-starts the protocol. 

We implemented a simulator that generates computations of these protocols, 
and we used state-space search to detect possible violations of the given invariant 
in those computations, i.e., to detect Poss-i or Poss-i/p^. To apply PSposs, 
we write both predicates as conjunctions. For -'Idt, we rewrite the implication 
P ^ Q as -'P V Q and then use DeMorgan’s Law (applied to the outermost 
negation and the disjunction). For ->Ipr, we simply use DeMorgan’s Law. The 
simulator accepts N and S as arguments and halts when some process has exe- 
cuted S' — 1 events. Message latencies and other delays are selected randomly 
from the distribution 1 -|- exp(l), where exp(a;) is the exponential distribution 
with mean x. The search optionally uses sleep sets, as described in [God96], as 
a further optimization. Sleep sets help eliminate redundancy caused by explo- 
ring multiple interleavings of independent transitions in a persistent set. Sleep 
sets are particularly effective for Poss <P because, if <P does not hold in s, then 
transitions in PSposs(s) are pairwise independent. 

Search was done using four levels of optimization: no optimization, persi- 
stent sets only, sleep sets only ([God96, Fig. 5.2] with PS = enabled), and both 
persistent sets and sleep sets ([God96, Fig. 5.2] with PS = PSposs)- 
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Data collected by fixing the value of at 3 or 5 and varying S in the range 
[2. .80] indicate that in all cases, Nt and Ns are linear in S. This is because both 
examples involve global synchronizations, which ensure that a new local state of 
any process is not concurrent with any very old local state of any process. 

The following table contains measurements for the database partitioning ex- 
ample with N = 5 and S' = 80 and for the primary-secondary example with 
N = 9 and S = 60. Using both persistent sets and sleep sets reduced Nt (and, 
roughly, the running time) by factors of 775 and 72, respectively, for the two 
examples. 



Example 


No optimization 
Nt Ng 


Sleep 
Nt Ng 


Persistent 
Nt Ng 


Persis. + Sleep 
Nt Ng 


database partition 


343170 88281 


88280 88281 


640 545 


443 444 


primary-secondary 


3878663 752035 


752034 752035 


91874 61773 


53585 53586 



To help determine the dependence of Nt on N, we graphed In Nt vs. N 
and fit a line to it; this corresponds to equations of the form Nt = The 

results are graphed in Figure 4. The dependence on S is linear, so using different 
values of S in different cases does not affect the dependence on N {i.e., it affects 
b but not a). In one case, namely, the database partitioning example with both 
persistent sets and sleep sets, the polynomial Nt = bN^-^^ yields a better fit 
than an exponential for the measured region of iV = [3. .8]. The dependence of 
Ns on N is similar. 




Fig. 4. Datapoints and fitted curves for In Nt vs. N for all four levels of optimiza- 
tion. Left: Database partitioning example with S' = 25 for the two searches not using 
persistent sets and S = 50 for the two searches using persistent sets. Right: Primary- 
secondary example with S = 60. 



Garg and Waldecker’s algorithm for detecting Poss ^ for conjunctions of 1- 
local predicates [GW94] is not applicable to the database partitioning example. 
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because -^Idb contains clauses like partn^^ ^ partrij which are not 1-local. Their 
algorithm can be applied to the primary-secondary example by putting ~'Ipr in 
disjunctive normal form (DNF) and detecting each disjunct separately. -<Ipr is 
compactly expressed in conjunctive normal form. Putting -<Ipr in DNF causes 
an exponential blowup in the size of the formula. This leads to an exponential 
factor in the time complexity of applying their algorithm to this problem. 
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Appendix: Selected Definitions and Proofs 

Definition of Independence [God96]. Transitions t\ and t2 are independent in a 
state s if 

1. Independent transitions can neither disable nor enabled each other, i.e., 

(a) if t\ G enabled(s) and s % s', then t2 G enabled(s) iff t2 G enabled(s'); 

(b) condition (a) with ti and t 2 interchanged holds; and 

2. Enabled independent transitions commute, i.e., if {ti,t2} C enabled(s), 
then there is a unique state s' such that s ^ si ^ s' and s ^ S2 ^ s' . 

Proof of Theorem 1 : It suffices to show that for each transition t in T, t is 
independent with in s„. <P is false in s, so to ^ enabled(s), so t is ti^k, as 
defined in (1), for some i and k. Note that t„ ^ to, because by hypothesis, is 
false in s„. The transitions of each process occur in the order they are numbered, 
and ti^k G enabled(s), so the next transition of process i that is executed after 
state s is in other words, from state s, no transition of process i can occur 
before ti^k does. Since ti^k G T and (Vi G [l..n] : ti ^ T), it follows that t„ is 
a transition of some process i' with i' ^ i A i' ^ Q. From the structure of the 
system, it is easy to show that, once ti^k has become enabled, such a transition 
t„ cannot enable or disable ti^k and commutes with ti k when both are enabled. 
Thus, ti^k and are independent in s„. □ 




Efficient Detection of Global Properties in Distributed Systems 279 



Proof of Theorem 2: First we show that (6) holds in the else branch. In 
that branch, enabled(s) ^ 0, so enahled{s) contains some transition Let 

s s'. Suppose (j)j is true in s' . The guard of implies that ^ is false 

in s', so there exists i ^ j such that (pi is false in s'. A transition of process j 
does not change the local state of process i, so s{pi) = s'{pi), so (pi is false in s, 
so stayfalse{i, s{Ti)) holds. 

Suppose pj is false in s'. If pj is false in s, then stayfalse{j, s{rj)) holds. 
Otherwise, suppose pj is true in s. Since s is reachable, is false in s, so there 
exists i A J such that pi is false in s. Note that s{pi) = s'{pi), so pi is false in 
s', so stayfalse{i, s{Ti)) holds. 

Now we show that (7) holds in the else branch. By (6), stayfalse{i, s{Ti)) 
holds for some i. Let j be the node with no outedges that was selected by 
the algorithm. If suffices to show that executing ) from state s would not 
cause (p to become true. If t = j, then this follows immediately from the second 
conjunct in stayfalse{i, s{Ti)). If z A J) then this follows immediately from the 
first conjunct in stayfalse{i, s{Ti)) and the fact that tj^s(Tj) does not change the 
local state of process z. Note that, in both cases, pi is false in s'. 

Finally, we show that PSoef (s) is persistent in s. This is trivial if enabled(s) = 
0. Suppose enabled(s) A 0, so stayfalse{i, s{Ti)) holds. Let {tj,s(T„-)} = PSDef(s). 

Let Si -A S2 S3 • • • Sn ^ Sn+i such that Si = s and (Vfc G [l..rz] : tk ^ t). 
It suffices to show that and tj,s{Tj) &re independent in s„. As noted at the 
end of the previous paragraph, executing tj^s{T-) in s leaves pi false, ti, . . . 
are not transitions of process z, because the next transition of process z cannot 
occur before ) occurs (this follows from the choice of j and the definition 
of wait-for graph). Thus, 

(V/c G [1..ZZ+ 1] : Sk{pi) = s{pi) A executing tj^s(Tj) in Sfc leaves pi false). (10) 
Consider the requirements in the definition of independence. 

(la) Suppose G enabled{sn) ■ Let s„ -G^*s'. By hypothesis, G enabled{sn), 

so we need to show that G enabled(s'). Since is enabled in s„, it suffices 

to show that executing in s' leaves (P false. By (10), executing tj,s(Tj) in 

s„ leaves pi false, and tn is not a transition of process z, so executing in 
s' leaves pi false. 

(lb) Suppose tn G enabled(sn). Recall that s„ Sn+i. We need to show that 
tj,s(Tj) G enabled(sn) iff tj,s{Tj) G enabled {s n+i) ■ tj,s{Tj) is enabled in s, and 
by (10), executing tj^s(Tj) in Sn or in s„+i leaves pi and hence <P false. It 
follows that tj^s(Tj) is enabled in both s„ and s„+i. 

(2) It is easy to show that and commute. 

Thus, tn and tj^s(Tj) are independent in s„, and PSDef(s) is persistent in s. □ 
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Abstract. Hierarchical state machines is a popular visual formalism 
for software specihcations. To apply automated analysis to such speci- 
hcations, the traditional approach is to compile them to existing model 
checkers. Aimed at exploiting the modular structure more effectively, our 
approach is to develop algorithms that work directly on the hierarchical 
structure. First, we report on an implementation of a visual hierarchical 
language with modular features such as nested modes, variable scoping, 
mode reuse, exceptions, group transitions, and history. Then, we identify 
a variety of heuristics to exploit these modular features during reachabi- 
lity analysis. We report on an enumerative as well as a symbolic checker, 
and case studies. 



1 Introduction 

Recent advances in formal verification have led to powerful design tools for hard- 
ware (see [CK96] for a survey), and subsequently, have brought a lot of hope of 
their application to reactive programming. The most successful verification tech- 
nique has been model checking [CE81,QS82]. In model checking, the system is 
described by a state-machine model, and is analyzed by an algorithm that explo- 
res the reachable state-space of the model. The state-of-the-art model checkers 
(e.g. Spin [Ho197] and Smv [McM93]) employ a variety of heuristics for efficient 
search, but are typically unable to analyze models with more than hundred state 
variables, and thus, scalability still remains a challenge. A promising approach 
to address scalability is to exploit the modularity of design. Modern software 
engineering methodologies such as UML [BJR97] exhibit two kinds of modular 
structures, architectural and behavioral. Architectural modularity means that a 
system is composed of subsystems using the operations of parallel composition 
and hiding of variables. The input languages of standard model checkers (e.g., 
S/R in Cospan [AKS83] or Reactive modules in Mocha [AH99]) support ar- 
chitectural modularity, but provide no support for modular description of the 
behaviors of individual components. In this paper, we focus on exploiting the 
behavioral hierarchy for efficient model checking. 

The notion of behavioral hierarchy was popularized by the introduction of 
Statecharts [Har87], and exists in many related modeling formalisms such as 
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Modecharts [JM87] and Rsml [LHHR94]. It is a central component of various 
object-oriented software development methodologies developed in recent years, 
such as Room [SGW94], and the Unified Modeling Language (Uml [BJR97]). 
Such hierarchic specifications have many powerful primitives such as exceptions, 
group transitions, and history, which facilitate modular descriptions of complex 
behavior. The conventional approach to analyze such specifications is to compile 
them into input languages of existing model checkers. For instance, Chan et 
al [CAB+98] have analyzed Rsml specifications using Smv, and Leue et al have 
developed a hierarchical and visual front-end to Spin. While the structure of the 
source language is exploited to some extent (e.g., [CAB+98] reports heuristics 
for variable ordering based on hierarchical structure, and [BLA~''99] reports a 
way of avoiding repeated search in same context), compilation into a, flat target 
language loses the input structure. In terms of theoretical results concerning the 
analysis of such descriptions, verifying linear properties of sequential hierarchical 
machines can be done efficiently without flattening [AY98], but in presence of 
concurrency, hierarchy causes an exponential blow-up [AKY99]. 

The input language to our model checker is based on hierarchic reactive mo- 
dules [AGOO]. This choice was motivated by the fact that, unlike Statecharts 
and other languages, in hierarchic reactive modules, the notion of hierarchy is 
semantic with an observational trace-based semantics and a notion of refinement 
with assume-guarantee rules. Furthermore, hierarchic reactive modules support 
extended state machines where the communication is via shared variables. The 
first contribution of this paper is a concrete implementation of hierarchic re- 
active modules. Our implementation is visual consistent with modern software 
design tools, and is in Java. The central component of the description is a mode. 
The attributes of a mode include global variables used to share data with its 
environment, local variables, well-defined entry and exit points, and submodes 
that are connected with each other by transitions. The transitions are labeled 
with guarded commands that access the variables according to the the natu- 
ral scoping rules. Note that the transitions can connect to a mode only at its 
entry/exit points, as in Room, but unlike Statecharts. This choice is impor- 
tant in viewing the mode as a black box whose internal structure is not visible 
from outside. The mode has a default exit point, and transitions leaving the de- 
fault exit are applicable at all control points within the mode and its submodes. 
The default exit retains the history, and the state upon exit is automatically 
restored by transitions entering the default entry point. Thus, a transition from 
default exit to entry models a group transition applicable to all control points 
inside. While defining the operational semantics of modes, we follow the stan- 
dard paradigm in which transitions are executed repeatedly until there are no 
more enabled transitions. Our language distinguishes between a mode definition 
and a mode reference, and this allows sharing and reuse. 

Our model checker checks invariants by reachability analysis of the input mo- 
del. The model is parsed into an internal representation that directly reflects the 
hierarchical structure, and the analysis algorithms, symbolic and enumerative, 
attempt to exploit it in different ways: 
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Transition indexing. The transition relation is maintained indexed by the 
modes and their control points. In the enumerative setting, this is beneficial 
for quick access to potentially enabled transitions, and also due to shared 
mode definitions. In the symbolic setting, this provides a generalization of 
the traditional conjunctively partitioned representation [McM93] . 
State-space representation. In the enumerative setting, states are repre- 
sented as stacks of vectors rather than vectors, and this is useful in handling 
priorities of transitions. In symbolic search, we maintain the state-space as 
a forest of binary decision diagrams indexed by control points. The resulting 
search has, consequently, a mixture of enumerative and symbolic strategies. 
Typing. Each mode explicitly declares the variables that it reads and writes, 
thus, providing different types for different transitions. This information is 
used in symbolic search for heuristics such as early quantification. 

Variable scoping. The pool of variables is not global. For instance, the state 
can consist of variables x and y in one mode, and x and z in another. 
This information, available statically, is exploited by both the searches. This 
optimization is possible due to the encapsulation provided by our language. 

Note that the above heuristics are quite natural to the hierarchical represen- 
tation, and has advantages with respect to the fiat representation of the same 
model. Another advantage of the language is that the granularity of steps of in- 
teracting components can be controlled as desired. This is because a macro-step 
of a mode corresponds to executing its micro-steps repeatedly until there are no 
more enabled transitions, and parallel composition corresponds to interleaving 
macro-steps. 

We have implemented an enumerative checker based on depth-first search, 
and a symbolic search that uses BDD packages from VIS [BHSV+96]. We report 
on two case studies. As a first example, we modeled the tcp protocol. Our visual 
interface allowed a direct mapping of the block-diagram description from [PD96] , 
and our enumerative checker found a deadlock bug in that description. Second, 
we constructed an example that is illustrative of nesting and sharing of modes, 
and scoping of variables. The performance of both enumerative and symbolic 
checkers is significantly superior compared to the respective traditional checks. 

2 Modeling Language 

Modes A mode has a refined control structure given by a hierarchical state ma- 
chine. It basically consists of a set of submode instances connected by transitions 
such that at each moment of time only one of the submode instances is active. 
A submode instance has an associated mode and we require that the modes 
form an acyclic graph with respect to this association. For example, the mode 
M in Figure 1 contains two submode instances, m and n pointing to the mode 
N. By distinguishing between modes and instances we may control the degree 
of sharing of submodes. Sharing is highly desirable because submode instances 
(on the same hierarchy level) are never simultaneously active in a mode. Note 
that a mode resembles an or state in Statecharts but it has more powerful 
structuring mechanisms. 
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Fig. 1. Mode diagrams 

Variables and scoping A mode may have global as well as local variables. The 
set of global variables is used to share data with the mode’s environment. The 
global variables are classified into read and write variables. The local variables of 
a mode are accessible only by its transitions and submodes. The local and write 
variables are called controlled variables. Thus, the scoping rules for variables are 
as in standard structured programming languages. For example, the mode M in 
Figure 1 has the global read variable x, the global write variable y and the local 
variable z. Similarly, the mode N has the global read-write variable z and the 
local variable u. 

The transitions of a mode may refer only to the declared global and local 
variables of that mode and only according to the declared read/write permission. 
For example, the transitions a,b,c,d,e,f,g,h,i,j and k of the mode M may 
refer only to the variables x , y and z. Moreover, they may read only x and z and 
write y and z. The global and local variables of a mode may be shared between 
submode instances if the associated submodes declare them as global (the set of 
global variables of a submode has to be included in the set of global and local 
variables of its parent mode). For example, the value of the variable z in Figure 
1 is shared between the submode instances m and n. However, the value of the 
local variable u is not shared between m and n. 

Control points and transitions To obtain a modular language, we require the 
modes to have well defined control points classified into entry points (marked as 
white bullets) and exit points (marked as black bullets). For example, the mode 
M in Figure 1 has the entry points el,e2, e3 and the exit points xl,x2,x3. 
Similarly, the mode N has the entry points el , e2 and the exit points xl , x2. The 
transitions connect the control points of a mode and of its submode instances to 
each other. For example, in Figure 1 the transition a connects the entry point 
e2 of the mode M with the entry point el of the submode instance m. The name 
of the control points of a transition are attributes and our drawing tool allows 
to optionally show or hide them to avoid cluttering. 

According to the points they connect, we classify the transitions into entry, 
internal and exit transitions. For example, in Figure 1, a,d are entry transi- 
tions, h,i,k are exit transitions, b is an entry/exit transition and c,e,f ,g,j 
are internal transitions. These transitions have different types. Entry transitions 
initialize the controlled variables by reading only the global variables. Exit tran- 
sitions read the global and local variables and write only the global variables. 
The internal transitions read the global and the local variables and write the 
controlled variables. 
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Default control points To model preemption each mode (instance) has a 
special, default exit point dx. In mode diagrams, we distinguish the default exit 
point of a mode from the regular exit points of the mode, by considering the 
default exit point to be represented by the mode’s border. A transition starting 
at dx is called a preempting or group transition of the corresponding mode. It 
may be taken whenever the control is inside the mode and no internal transition 
is enabled. For example, in Figure 1, the transition f is a group transition for 
the submode n. If the current control point is q inside the submode instance n 
and neither the transition b nor the transition f is enabled, then the control is 
transferred to the default exit point dx. If one of e or f is enabled and taken 
then it acts as a preemption for n. Hence, inner transitions have a higher priority 
than the group transitions, i.e., we use weak preemption. This priority scheme 
facilitates a modular semantics. As shown in Figure 1, the transfer of control to 
the default exit point may be understood as a default exit transition from an 
exit point x of a submode to the default exit point dx that is enabled if and 
only if, all the explicit outgoing transitions from x are disabled. We exploit this 
intuition in the symbolic checker. 

History and closure To allow history retention, we use a special default entry 
point de. As with the default exit points, in mode diagrams the default entry 
point of a mode is considered to be represented by the mode’s border. A transi- 
tion entering the default entry point of a mode either restores the values of all 
local variables along with the position of the control or initializes the controlled 
variables according to the read variables. The choice depends on whether the last 
exit from the mode was along the default exit point or not. This information is 
implicitly stored in the constructor of the state passed along the default entry 
point. For example, both transitions e and g in Figure 1, enter the default entry 
point de of n. The transition e is called a self group transition. A self group 
transition like e or more generally a self loop like f , p , g may be understood 
as an interrupt handling routine. While a self loop may be arbitrarily complex, 
a self transition may do simple things like counting the number of occurrences 
of an event (e.g., clock events). Again, the transfer of control from the default 
entry point de of a mode to one of its internal points x may be understood as a 
default entry transition that is taken when the value of the local history variable 
coincides with x. If x was a default exit point n.dx of a submode n then, as 
shown in Figure 1, the default entry transition is directed to n.de. The reason 
is that in this case, the control was blocked somewhere inside of n and default 
entry transitions originating in n.de will restore this control. A mode with ad- 
ded default entry and exit transitions is called closed. Note that the closure is 
a semantic concept. The user is not required to draw the implicit default entry 
and exit transitions. Moreover, he can override the defaults by defining explicit 
transitions from and to the default entry and exit points. 

Operational semantics: macro-steps In Figure I, the execution of a mode, 
say n, starts when the environment transfers the control to one of its entry points 
el or e2. The execution of n terminates either by transferring the control back 
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to the environment along the exit points xl or x2 or by “getting stuck” in q or 
r as all transitions starting from these leaf modes are disabled. In this case the 
control is implicitly transferred to M along the default exit point n.dx. Then, if 
the transitions e and f are enabled, one of them is nondeterministically chosen 
and the execution continues with n and respectively with p. If both transitions 
are disabled the execution of M terminates by passing the control implicitly to its 
environment at the default exit M.dx. Thus, the transitions within a mode have 
a higher priority compared to the group transitions of the enclosing modes. 

Intuitively, a round of the machine associated to a mode starts when the 
environment passes the updated state along a mode’s entry point and ends when 
the state is passed to the environment along a mode’s exit point. All the internal 
steps (the micro steps) are hidden. We call a round also a macro step. Note that 
the macro step of a mode is obtained by alternating its closed transitions and 
the macro steps of the submodes. 

Semantics The execution of a mode may be best understood as a game, i.e., 
as an alternation of moves, between the mode and its environment. In a mode 
move, the mode gets the state from the environment along its entry points. It 
then keeps executing until it gives the state back to the environment along one of 
its exit points. In an environment move, the environment gets the state along one 
of the mode’s exit points. Then it may update any variable except the mode’s 
local ones. Finally, it gives the state back to the mode along one of its entry 
points. An execution of a mode M is a sequence of macro steps of the mode. 
Given such an execution, the corresponding trace is obtained by projecting the 
states in the execution to the set of global variables. The denotational semantics 
of a mode M consists of its control points, global variables, and the set of its 
traces. 

Parallel composition by interleaving A mode having only two points, the 
default entry and the default exit, is called a mode in top level form. These modes 
can be used to explicitly model all the parallel composition operators found in the 
theory of reactive systems [AGOO] . For simplicity we consider in this paper only 
the interleaving semantics of parallel composition. In this semantics, a round 
(macro step) of a composed mode is a round of one of its submodes. The choice 
between the submodes is arbitrary. We can easily model this composition, by 
overriding the default entry transions of the submodes if these are in top level 
form. Note also that, the state of the supermode is given, as expected, as a tuple 
of the states of the submodes. 

3 Search Algorithms 

3.1 Enumerative Search 

The enumerative search algorithm takes as input a set of top-level modes and 
a set of global variables that these modes can read and modify. We are also 
given an invariant that we want this system to satisfy. The invariant is a boolean 
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Fig. 2. Local variables conserve state size 

expression defined on the global variables. For the enumerative search we assume 
that each of the top-level modes is sequential; each top-level mode represents 
a single thread of execution. These top-level modes are run concurrently by 
interleaving their macro-steps. A state of the system consists of the values of all 
the global variables and the state of each of the top-level modes. In each round 
one of the modes may modify the variables and change its own internal state 
yielding a new state. The set of states of the system can therefore be viewed as 
a directed graph where, if s and t are states of the system, (s,t) is an edge in 
the graph if and only if s yields t after one round of execution. 

Searching all states is straightforward; beginning from an initial state we 
perform a depth-first search on the graph. For each state we encounter during 
the search we check that the desired invariant holds. If the invariant doesn’t hold 
for some state then the depth-first search algorithm supplies us with a path in 
the graph from the initial state to the state which violates the invariant. This 
path forms a counter-example which is returned to the user. 

The set of states that have been visited is stored in a hash table so that we 
can check if a given state is in the set in constant time. The set of successors of 
a state is computed by examining the modes. The hierarchical structure of the 
modes is retained throughout the search. This structure allows us to optimize 
the search in a number of ways. 

Transition Indexing To determine if there is an edge from a state s to a state 
t we need to examine all the possible sequences of micro-steps that are enabled 
in s. Each micro-step corresponds to a transition from the active control point 
to a destination point (which becomes the active point in the next micro-step) . 
In the mode representation transitions are indexed by their starting points. If 
we want to find all enabled transitions we only need to examine those that start 
from the active point. 

Local Variables Modes have local variables which are only visible to submodes. 
The internal state of a mode can be stored as a stack of sets of variables. This 
stack resembles the control stack present during the execution of a program in 
C or Java. Each element of the stack contains the variables that are local to the 
corresponding level in the mode hierarchy. 
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Since local variables of a mode are available only to the submodes of that 
mode the size of a mode’s state is smaller than a system where all variables are 
global. Figure 2 shows a simple mode diagram and three possible states of the 
mode. Modes Ml and M3 are submodes of M2. States sl,s2 and s3 are the 
respective possible states when Ml, M2 and M3 are active. If Ml is active the 
x,y and a variables are in scope and therefore must be present in the state si, 
The variable u is not in scope if Ml is active and therefore it is not present in 
state si. If M2 is active then x,y and u are not scope and not present in s2. 
Similarly, s3 does not contain the x and y variables. This means the total size 
of the state of a mode is proportional to the depth of the hierarchy. 

State Sharing The stack structure of a mode’s state also allows us to con- 
serve memory by sharing parts of the state. Two states which are distinct may 
nevertheless contain some stack elements which are identical. We can construct 
the states in such a way that the equivalent elements of the stack are actually 
the same piece of memory. 

For example, states s and t have two levels of hierarchy corresponding to 
local and global variables. If all the global variables have the same values in s 
and t then both s and t can refer to the same piece of memory for storing the 
values of the global variables. 

State Hashing The tool gives the user the option of storing a hash of a state 
instead of the entire state. Since the hashed version of a state occupies less 
memory than the state itself we can search more states before we run out of 
memory. The problem with this technique is that the enumerative search will 
skip states whose hashes happen to be the same as previously visited states. As 
a result the enumerative search may falsely conclude that an invariant is true 
for all states. On the other hand, any counter-example that is found is valid. 
By varying the amount of information lost through hashing we can balance the 
need for accuracy with the need to search large state spaces. The Spin model- 
checker uses this hashing technique [Hol91] for states which have a fixed number 
of variables. However, our tool must hash states which vary in size and structure 
(because of the local variables). 

3.2 Symbolic Search 

Similarly to the enumerative search algorithm, the symbolic search algorithm 
takes as input a set of top-level modes and a set of variables that these modes can 
read and modify and an invariant that we want this system to satisfy. However, 
in contrast to the enumerative search we do not need to assume that the top- 
level modes are sequential. The reason is that a state in this case is not a stack, 
but rather a map (or context) of variables to their values. This context varies 
dynamically, depending on the currently accessible variables. 

In order to perform the symbolic search for a hierarchic mode we could pro- 
ceed as follows. (1) Obtain a flat transition relation associated to the hierarchic 
mode. (2) Represent the reached states and the transition relation by ordered 
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multi-valued binary decision diagrams (mdds). (3) Apply the classic symbolic 
search algorithm. This is the current model checking technology. 

We argue however, that such a flat representation is not desirable. The main 
reason is its inefficient use of memory. One can do much better by keeping the 
above mdds in a decomposed way, as suggested by the modular structure. In 
particular, a natural decomposition is obtained by keeping and manipulating 
the control points outside the mdds. 

Reached set representation. Keeping the control points outside the state 
allows us to partition the state space in regions, each containing all states with 
the same control point. This decomposition has not only the advantage that any 
partition may be considerably smaller than the entire set but also that it is very 
intuitive. It is the way mode diagrams, and in general extended state machines, 
are drawn. Hence, we represent the reached state space by a mapping of the 
currently reached control points to their associated reached region mdd. The 
region mdd of a control point is minimized by considering only the variables 
visible at that control point. This takes advantage of the natural scoping of 
variables in a hierarchic mode. 

Update relation representation. The update relation of a hierarchic mode 
d is not flattened. It is kept in d by annotating each transition of d with the 
mdd corresponding to the transition. This has the following advantages. First, 
all instances of d at the same level of the hierarchy and connected only at their 
regular points may share these transitions. The reason is that their local variables 
are never simultaneously active. Second, working with scoped transition relations 
and knowing the set of variables U updated by a transition t (broadly speaking 
working with typed transition relations) we may compute the image image{R, t) 
of a region R in an optimal way as {3U.R A t)[U/U'] and not as {3V.R A t A 
idv\u))[V/V] where V is the set of all variables and idy\u is the conjunction of 
all relations x' = x with x G V\U. The second, more inefficient representation is 
to our knowledge the way the image is computed in all current model checkers. 
Moreover, while the internal and default transitions have this optimized form, 
the entry and exit transitions allow even further optimization via quantification 
of variables that are no longer needed. 

Entry transitions. image{R,t) = {3{U\Vi).RAt)[U/U'] because R and t is 
not allowed to reference the unprimed local variables in V/. 

Exit transitions. image{R,t) = {3{UUVi).RA t)[U/U'] because t is not al- 
lowed to reference the primed local variables in V{ and the unprimed local 
variables in V) are hidden. 

Variable ordering The variable ordering is naturally suggested by the partial 
ordering between modes. Considering that variable names are made disjoint by 
prefixing them with the names of the submode instances along the path to the 
referencing submode instance, then the ordering is nothing but the lexicographic 
ordering of the prefixed names. This ordering makes sure that variables defined 
at the same level in the mode hierarchy are grouped together. 
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Fig. 3. A generic hierarchic example 

Initialization. The initial state is a mapping of the history variables to a 
special, bottom value. Passing this state along the default entry point of the 
top-level mode, all the way down in the mode hierarchy, assures the selection 
of an initialization transition that updates the local variables according to their 
initialization statement in the mode diagram. The initial reached set maps the 
default entry point of the top level mode to this state. 

Image computation. The main loop of the image computation algorithm is 
as usual. It starts with the initial macro onion ring (the initial reached set) and 
computes in each iteration (macro-step) a new macro onion ring by applying the 
image computation to the current macro onion and (the update relation of) the 
top level mode. The algorithm terminates either if the new macro onion ring is 
empty or if its intersection with the target region (containing the “bad” states) 
is nonempty. 

The image computation of the next macro onion ring is the secondary loop. It 
starts with a micro onion ring having only one control point: the pair consisting 
of the default entry point of the top level mode and the macro onion ring mdd. 
Each micro step computes a new onion ring by applying the image computation 
to all points in the current micro onion ring and for each point to all outgoing 
transitions in a breadth first way. A destination point is added to the new micro 
onion ring if the difference between the computed mdd and the mdd associated to 
that point in the reached set is not empty. The mdd in the reached set is updated 
accordingly. The loop terminates when the new micro onion ring contains again 
only one control point, the default exit point of the top level mode and its 
associated mdd. The new macro onion ring is then the difference between the 
set of states corresponding to this mdd and the reached set associated to the top 
level default entry point. 

Generic hierarchic system. In Figure 3 we show a generic hierarchic system 
with three levels of nesting. The mode user nondeterministically sets a con- 
trol variable c to 1 or 2 meaning increment and respectively leave. The mode 
system consists of the nested modes ctrlo and ctrli, that are further decompo- 
sed in the modes ctrlgo, ctrlgi and ctrlio, ctrln, respectively. Each has a 
local variable ranging between zero and a maximum value n that is incremented 
when c is 1 and the value of the local variable is less than n. When c is 2 or 
when c is one and the local variable reached n the mode is left. To ensure only 
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one increment per macro-step, after performing a transition the mode sets c to 
0. This blocks it until the user sets the next value. The identity transition id is 
the same on all levels and it is defined as follows: 

id = true -> skip 

The transitions Iv and inc of the mode system are defined as below: 

Iv = c=2| (c=l&w=n)->c:=0;w:=0; 

inc = c=l&w<n ->c:=0;w:=w+l; 

Except the local variable to be tested and incremented, the transition inc has 
the same definition in all modes. The exit transition Iv has a simpler body in 
the submodes. For example in mode ctrloo: 

Iv = c = 2 I (c = 1 & uq = n) -> skip 

Finally, the entry transitions en and Iv of the leaf modes have the same definition 
modulo the local variable. For example, in mode ctrloo: 

Iv = c = 2 -> skip 

en= c!=2->c;=0;uo:=0; 

The mdd associated to the point k of the mode ctrloo is ^ boolean relation 
Rctrioo k{c, w, Vq, Uq) . Similarly, the mdds associated to the exit point o of the mo- 
des ctrloo and ctrlo are boolean relations Rctrioo o{c,w,vo) and Rctrio.o{c,w) 
respectively. Note that variables are not quantified out at the default exit points 
because we have to remember their value. Additionally, at these points we have 
to save the active submode information. Hence, the mdds associated to the 
default exit points dx of the modes ctrloo and ctrlo are boolean relations 
Rctrioo.dx{c,w,vo,uo) and Rctrio.dx{c,w,vo,uo,ho) respectively. The history va- 
riable ho is 0 if the active submode is ctrloo and 1 if the active submode is 
ctrloi. 

The mdd associated to a transition is a relation constructed, as usual, by 
considering primed variables for the next state values. For example, the transition 
inc of the mode ctrloo is defined by the relation 

inc{c, c' , Uq, u'o) = c = 1 A uo < n A c' = 0 A u'o = uo+1 

The image of the region Rctrioo-k under the transition inc is computed as below: 

(3c, Uo. RctrloQ.k (c, w, Vo, Uo) A inc{c, c', Uo, Mq)) [c, Uo/c', Ug] 

Lacking typing information, most model checkers use the more complex relation: 

{3c,w,vo,uo- Rctrioo.k{c,w,vo,uo) A inc{c, c' ,uo, u'o) A 

w' =W Avg= Vo) [c,W,Vo,Uo/c',w',VQ,Ug] 

The exit transition Iv of the mode ctrloo is defined by the following relation: 

lv{c, Uo) = c = 2 V (c = 1 A Uo = n) 
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The image of the region Rctrioo-k under this transition is computed as below: 



3uq. Rctrloo-k{C: Uq) A lv{c, Uq) 



It quantifies out the local variable Uq. It is here where we obtain considerable 
savings compared to classic model checkers. The image of the region Rctrioo-i 
under the entry transition: 

en(c, c , Uq) = 2 A c = 0 Auq = 0 

does not quantify out the local variable uq even if uq was updated, because the 
transition is not allowed to used the unprimed value of uq. 

(3c. Rctrloo.i{c, w, Vq) a lv{c, C, Uq)) [c, Uq / C , Wq] 

The ordering of the unprimed variables in the generic hierarchic system is defined 
as follows: c < w < vq < uq < u\ < v\ < U 2 < uq. 

4 Experimental Results 

Mutual Exclusion Our smallest non-trivial experiment involved Peterson’s 
algorithm for two party mutual exclusion [PetSl]. We implemented the algorithm 
using three modes that run concurrently. Modes pi and p 2 represented the two 
parties that want to use the shared resource. A special mode called clock was used 
to toggle a variable tick that the other modes consume whenever a step of the 
algorithm is executed. This use of a tick variable ensures that each macro-step of 
a mode corresponds to exactly one step of the algorithm that we are modeling. 
Once the tick variable is consumed a mode is blocked until the next macro-step. 
This technique allows a programmer to control the number of micro-steps that 
occur within one macro-step. Our tool performed an enumerative search of all 
possible executions. The search revealed that the model has 276 distinct states, 
all of which preserve mutual exclusion. The tool also verified that the algorithm 
is free of deadlock by checking that each state of the model leads to at least 
one successor state. Both searches took about 4 seconds to complete on an Intel 
Celeron 333 mhz. 

TCP The Transmission Control Protocol (TCP) is a popular network protocol 
used to ensure reliable transmission of data. TCP connections are created when 
a client opens a connection with a server. Once a connection is opened the client 
and server exchange data until one party decides to close the connection. When 
a connection is opened or closed the client and server exchange special messages 
and enter a series of states. 

TCP is designed to work even if some messages get lost or duplicated. It is 
also designed to work if both parties simultaneously decide to close a connection. 
A desirable property of protocol like TCP is that it cannot lead to deadlock; it 
should be impossible for both client and server to be waiting for the other to 
send a message. 
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Fig. 4. TCP state-transition diagram 

There is a concise description of the messages and states of TCP in [PD96] 
which is reproduced in Figure 4. This description is given as a state-transition 
diagram which makes it very easy to model it as a mode in our tool. We set 
out to verify that the TCP protocol, as described in [PD96], is free of deadlock 
under certain assumptions. 

In our first experiment we simulated a client and server opening and closing 
a TCP connection. In our model we assumed that the network never lost or 
duplicated a message. We also assumed the client and server both had a one cell 
queue for storing incoming messages; if a party received a second message before 
it had a chance to process the first message then the first message would be lost. 
Our tool performed an enumerative search of possible execution sequences and 
discovered a bug in the description of TCP after searching 2277 states. If both 
parties decide simultaneously to close the connection while in the established 
state then they will both send FIN messages. One of the parties, say the client, 
will receive the FIN message and respond by sending an ACK message and 
entering the closing state. If this ACK arrives at the server before the other 
FIN is processed the second FIN will get lost. When the ACK is read the server 
will move to the fin-wait-2 state. Now the protocol is deadlocked; the client is 
waiting for an ACK message while the server is waiting for a FIN message. 

For our second experiment we modified the model so that the client and 
server had queues that could hold two messages instead of one. A message would 
only get lost if a third message arrived before the first was processed. Our tool 
found a deadlock state in this model after searching 3535 states. Once again 
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the deadlock occurs after both client and server decide simultaneously to close 
the connection. In this case, however, the server decides to close the connection 
before it has been established. This can lead to a state where the server’s queue 
gets filled and a message gets dropped. The deadlock occurs when the server is 
in the fin-wait-2 state and the client is in the closing state, which is the same 
deadlock state that we saw in the first TCP experiment. 

Both experiments were performed using an Intel Celeron 333mhz machine. 
The first experiment ran for 74 seconds and the second experiment ran for 138 
seconds. 



Generic hierarchic system The hierarchical structure of the example in Fi- 
gure 3 makes it a good candidate for the state sharing optimization described in 
Section 3.1. The three levels of local variables allow memory to be re-used when 
storing states; if some levels are identical in two distinct states the enumerative 
search algorithm makes an effort to share the memory used to store the levels 
that both states have in common. 

Each state contains a set of objects called environments. Each environment 
keeps track of one level of the hierarchy. An enumerative search of this example 
found 3049 distinct states in 138 seconds. These states contained 14879 envi- 
ronment objects, but because of state sharing only 8103 environment objects 
needed to be allocated. The technique yields a 45% reduction in the number of 
objects needed to store the set of visited states. 
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The symbolic search takes advantage of the existential quantification of local 
variables at all regular exit points. This leads, as shown in Figure 4, to a signi- 
ficant saving both in space (total number of nodes in the mdd pool) and time 
(for the reachability check) compared to the (C version) of the Mocha model 
checker [AHM+98]. The comparison was done for different values for n. 

Since the concurrency is in this example only on the top level and all the 
variables (excepting c) are local, one can heavily share the transition relations. 
In fact, only the modes ctrO and ctrOO are necessary and all references may 
point to these modes. As expected, using this sharing, we obtained the same 
results with respect to the reached set. 
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5 Conclusions 

We have reported on an implementation of a visual hierarchical language for 
modeling reactive systems, and enumerative and symbolic checkers that work 
directly on the hierarchical representation attempting to exploit the modularity. 
While hierarchical specifications seem more convenient to express complex requi- 
rements, in terms of the efficiency of analysis, two questions are of interest. First, 
given a verification problem, should one use a hierarchical notation hoping for 
tractable analysis? We don’t have adequate experimental evidence yet to answer 
this. In fact, given that the modeling languages of different tools differ so much, 
and different tools implement many different heuristics, parameters of a scientific 
comparison are unclear. Second, if the input specification is hierarchical, should 
one use the proposed solution over compilation into a non-hierarchical checker? 
Even though experimental data is small so far, we believe that there is adequate 
conceptual evidence suggesting a positive answer. A lot of work remains to be 
done to optimize the two checkers, and apply them to more and substantial 
examples. 
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Abstract. This is a study of the formal verification of a VLIW microprocessor that 
imitates the Intel Itanium [9] [1 2] [1 7] in features such as predicated execution, regis- 
ter remapping, advanced and speculative loads, and branch prediction. The formal 
verification is done with the Burch and Dill flushing technique [5] by exploiting the 
properties of Positive Equality [3] [4]. The contributions include an extensive use of 
conservative approximations in abstracting portions of the processor and a frame- 
work for decomposition of the Boolean evaluation of the correctness formula. The 
conservative approximations are applied automatically when abstracting a memory 
whose forwarding logic is not affected by stalling conditions that preserve the cor- 
rectness of the memory semantics for the same memory. These techniques allow a 
reduction of more than a factor of 4 in the CPU time for the formal verification of the 
most complex processor model examined relative to the monolithic evaluation of the 
correctness formula for a version of the same processor where conservative approxi- 
mations are not applied. 

1 Introduction 

VLIW architectures have been adopted recently by microprocessor design companies 
in an effort to increase the instruction parallelism and achieve higher instructions-per- 
cycle counts. The goal of this work is to make the Burch and Dill flushing technique 
[5] scale efficiently and with a high degree of automation for the formal verification of 
VLIW processors with speculative execution. The speculative features considered 
include predicated execution, register remapping, advanced and speculative loads, and 
branch prediction — all of them found in the Intel Itanium [9][12][17], to be fabricated 
in the summer of 2000. 

The focus of this work is on efficient and automatic scaling that is clearly impos- 
sible with theorem-proving approaches, as demonstrated by Sawada and Hunt [16] and 
Hosabettu et al. [11]. The former approach was applied to a superscalar processor and 
required the proofs of around 4,000 lemmas that could be defined only after months, if 
not a year, of manual work by an expert. The latter work examined the formal verifica- 
tion of a single-issue pipelined and a dual-issue superscalar processors, each of which 
was formally verified after a month of manual work, with the complexity of the man- 
ual intervention increasing with the complexity of the verified design. Clearly, meth- 
ods that require months of manual work in order to detect a bug will be impractical for 
the formal verification of wide VLIW designs with many parallel pipelines and specu- 
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lative features that will be fine-tuned constantly during aggressive time-to-market 
design cycles. 

The most complex VLIW processor examined in this paper has 9 parallel execu- 
tion pipelines, all the speculative features listed above, including branch prediction and 
multicycle functional units of arbitrary latency. Its exhaustive binary simulation would 
require more than 2^^^ sequences of 5 VLIW instructions each. However, the proposed 
techniques allow that processor to be formally verihed in less than 8 hours of CPU 
time on a 336 MHz SUN4. 

2 Background 

In this work, the logic of Equality with Uninterpreted Functions and Memories 
(EUFM) [5] is used in order to dehne abstract models for both the implementation and 
the specihcation processors. The syntax of EUFM includes terms and formulas. A 
term can be an Uninterpreted Function (UF) applied on a list of argument terms, a 
domain variable, or an ITE operator selecting between two argument terms based on a 
controlling formula, such that ITEiformula, terml, term!) will evaluate to terml when 
formula = true and to terml v/hsn formula = false. A formula can be an Uninterpreted 
Predicate (UP) applied on a list of argument terms, a propositional variable, an ITE 
operator selecting between two argument formulas based on a controlling formula, or 
an equation (equality comparison) of two terms. Formulas can be negated and con- 
nected by Boolean connectives. The syntax for terms can be extended to model memo- 
ries by means of the functions read and write, where read takes 2 argument terms 
serving as memory and address, respectively, while write takes 3 argument terms serv- 
ing as memory, address, and data. Both functions return a term. Also, they can be 
viewed as a special class of (partially interpreted) uninterpreted functions in that they 
are defined to satisfy the forwarding property of the memory semantics, namely that 
read{write(mem, aw, d), ar) = ITE(ar = aw, d, read{mem, arj), in addition to the prop- 
erty of functional consistency. Versions of read and write that extend the syntax for 
formulas can be dehned similarly, such that the version of read will return a formula 
and the version of write will take a formula as its third argument. Both terms and for- 
mulas are called expressions. 

UFs and UPs are used to abstract away the implementation details of functional 
units by replacing them with “black boxes” that satisfy no particular properties other 
than that of functional consistency — the same combinations of values to the inputs of 
the UF (or UP) produce the same output value. Three possible ways to impose the 
property of functional consistency of UFs and UPs are Ackermann constraints [1], 
nested /TFs [3] [4] [19], and “pushing-to-the-leaves” [19]. 

The correctness criterion is based on the flushing technique [5] and is expressed 
by an EUFM formula of the form 

V mi j A OT2J ... A 1 v ... v a m2^k - ^ fn„ k, ( 1 ) 

where n is the number of user-visible state elements in the implementation processor, k 
is the maximum number of instructions that the processor can fetch in a clock cycle, 
and nijj, I < i < n, 0 < j < k, is an EUFM formula expressing the condition that user- 
visible state element i is updated by the first j instructions from the ones fetched in a 
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single clock cycle. (See the electronic version of [19] for a detailed discussion.) The 
EUFM formulas m^j, m 2 j, ..., m„j, 0<j<k, are conjuncted in order to ensure that the 
user-visible state elements are updated in “sync” by the same number of instructions. 
The correctness criterion expresses a safety property that the processor completes 
between 0 and k of the newly fetched k instructions. 

In our previous work [19] we developed a completely automatic tool that exploits 
the properties of Positive Equality [3] [4], the encoding [8], and a number of conser- 
vative approximations in order to translate the correctness EUEM formula (1) to a 
propositional formula. The implementation processor is verified, i.e., is correct, if the 
propositional formula obtained from (1) after replacing each m.jj with its correspond- 
ing propositional formula fij, obtained after the translation, evaluates to true. This 
evaluation can be done with either BDDs [2] or SAT-checkers. Our previous research 
[19] showed that BDDs are unmatched by SAT-checkers in the verification of correct 
processors. However, we found that SAT-checkers can very quickly generate counter- 
examples for buggy designs. 

Positive Equality allows the identification of two types of terms in the structure of 
an EUEM formula — those which appear only in positive equations and are called 
p-terms, and those which can appear in both positive and negative equations and are 
called g-terms (for general terms). A positive equation is never negated (or appears 
under an even number of negations) and is not part of the controlling formula for an 
ITE operator. A negative equation appears under an odd number of negations or as part 
of the controlling formula for an ITE operator. The computational efficiency from 
exploiting Positive Equality is due to a theorem which states that the truth of an EUFM 
formula under a maximally diverse interpretation of the p-terms implies the truth of the 
formula under any interpretation. The classification of p-terms vs. g-terms is done 
before UFs and UPs are eliminated by nested ITEs, such that if an UF is classified as a 
p-term (g-term), the new domain variables generated for its elimination are also con- 
sidered to be p-terms (g-terms). After the UFs and the UPs are eliminated, a maximally 
diverse interpretation is one where: the equality comparison of two syntactically iden- 
tical (i.e., exactly the same) domain variables evaluates to true; the equality compari- 
son of a p-term domain variable with a syntactically distinct domain variable evaluates 
to false; and the equality comparison of a g-term domain variable with a syntactically 
distinct g-term domain variable could evaluate to either true or false and can be 
encoded with a dedicated Boolean variable — an variable [8]. 

In order to fully exploit the benefits of Positive Equality, the designer of an 
abstract processor model must use a set of suitable abstractions and conservative 
approximations. For example, an equality comparison of two data operands, as used to 
determine the condition to take a branch-on-equal instruction, must be abstracted with 
an UP in both the implementation and the specification, so that the data operand terms 
will not appear in negated equations but only as arguments to UPs and UFs and hence 
will be classified as p-terms. Similarly, a Finite State Machine (FSM) model of a mem- 
ory has to be employed for abstracting the Data Memory in order for the addresses, 
which are produced by the ALU and also serve as data operands, to be classified as p- 
terms. In the FSM abstraction of a memory, the present memory state is a term that is 
stored in a latch. Reads are modeled with an UF/^ that depends on the present memory 
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state and the address, while producing a term for the read data. Writes are modeled 
with an UF/^ that depends on the present memory state, the address, and a data term, 
producing a term for the new memory state, which is to be stored in the latch. The 
result is that data values produced by the Register File, the ALU, and the Data Memory 
can be classified as p-terms, while only the register identifiers, whose equations con- 
trol forwarding and stalling conditions that can be negated, are classified as g-terms. 

We will refer to a transformation on the implementation and specification proces- 
sors as a conservative approximation if it omits some properties, making the new pro- 
cessor models more general than the original ones. Note that the same transformation 
is applied to both the implementation and the specification processors. Flowever, if the 
more general model of the implementation is verified against the more general model 
of the specification, so would be the detailed implementation against the detailed spec- 
ification, whose additional properties were not necessary for the verification. 

Proposition 1. The FSM model of a memory is a conservative approximation of a 
memory. 

Proof If a processor is verified with the FSM model of a memory where the update 
function f^^ and the read function are completely arbitrary uninterpreted functions 
that do not satisfy the forwarding property of the memory semantics, then the 
processor will be verified for any implementation of and including /„ = write 
and f= read. □ 



3 VLIW Architecture Verified 

The goal of this paper is to formally verify the VLIW processor shown in Fig. 1. On 
every clock cycle, the Fetch Engine produces a packet of 9 instructions that are already 
matched with one of 9 functional units: 4 integer (Int FU), 2 floating-point (FP FU), 
and 3 branch-address (BA FU). There are no data dependencies among the instructions 
in a packet, as guaranteed by the compiler. Any number of instructions in a packet can 
be valid, i.e., can have the potential to modify user-visible state. Data values are stored 
in 4 register files — Integer (Int), Floating-Point (FP), Branch-Address (BA), and Pred- 
icate (Pred). A location in the Predicate Register File contains a single bit of data. 
Every instruction is predicated with a qualifying predicate register, such that the result 
of the instruction is written to user-visible state only if the qualifying predicate register 
has a value of 1. The predication is done at compile time. 

A Current Frame Marker register (CFM) is used to remap the register identifiers 
for accessing the Integer, Floating-Point, and Predicate Register Files. The CFM can 
be modified by every instruction in a packet. Two of the Integer Functional Units can 
generate addresses for accessing the Data Memory that is used for storing both integer 
and floating-point values. An Advanced Load Address Table (ALAT) is used as hard- 
ware support for advanced and speculative loads — a compile-time speculation. Every 
data address accessed by an advanced/speculative load is stored in the ALAT and is 
evicted from there by subsequent store instructions overwriting the same address. Spe- 
cial instructions check if an address, that was accessed by an advanced/speculative 
load, is still in the ALAT. If not, these instructions perform a branch to code where the 
load will be repeated non-speculatively together with any computations that have been 
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performed on the incorrect speculative data. Both the CFM and the ALAT are user- vis- 
ible state elements. A Branch Predictor supplies the Fetch Engine with one prediction 
on every clock cycle. 
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Fig. 1. Block diagram of the VLIW architecture that is formally verified. 

The Predicate Register File can be updated with predicate values computed by 
each of the 4 integer and 2 floating-point functional units. A predicate result depends 
on the instruction Opcode, the integer or floating-point data operands, respectively, and 
the value of a source predicate register. The Predicate Register File contents can be 
overwritten entirely or partially with the value of a data operand, as determined by an 
instruction Opcode for those 6 functional units. Note that each predicate value consists 
of a single bit, so that the contents of the Predicate Register File is the concatenation of 
these bits for all predicate register identifiers. The Predicate Register File contents can 
be moved to a destination integer register by each of the 4 integer functional units. 

Integer values from the Integer Register File can be converted to floating-point 
values and written to a destination in the Floating-Point Register File by each of the 
two floating-point functional units. These functional units can similarly convert a float- 
ing-point value to an integer one, which gets written to a destination in the Integer 
Register File. The floating-point functional units also perform floating-point computa- 
tions on 2 floating-point operands. The 2 floating-point ALUs, the Instruction Mem- 
ory, and the Data Memory can each have a multicycle and possibly data-dependent 
latency for producing a result. 

The value of a register in the Branch- Address Register File can be transferred to a 
destination in the Integer Register File by each of the 4 integer functional units. Fur- 
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thermore, the integer functional units can perform computations for which one of the 
operands is supplied by the Branch-Address Register File, the other by the Integer 
Register File, with the result being written to a destination in the Branch-Address Reg- 
ister File. The values in that register file are used by the 3 branch-address functional 
units for computing of branch target addresses, which also depend on the PC of the 
VLIW instruction packet and the instruction Opcode for the corresponding functional 
unit. A branch is taken if its qualifying predicate evaluates to 1 . 

The VLIW processor that is formally verified has 5 pipeline stages (see Fig. 1): 
Fetch, Register-Files-Access, Execution, Data-Memory-Access, and Write-Back. The 
Fetch Engine is implemented as a Program Counter (PC) that accesses a read-only 
Instruction Memory in order to produce a VLIW packet of 9 instructions. Forwarding 
is employed in the Execution stage in order to avoid read-after-write hazards for data 
values read from each of the 4 register files by supplying the latest data to be written to 
the corresponding source register by the instructions in the Data-Memory-Access and 
Write-Back stages. However, forwarding is not possible for integer and floating-point 
values loaded from the Data Memory and used by an instruction in the next packet. In 
such cases, the hazards are avoided by stalling the entire packet of the dependent 
instruction(s) in the Register-Files-Access stage and inserting a bubble packet in the 
Execution stage. Testing a qualifying predicate register for being 1 is done in the Exe- 
cution stage after forwarding the most recent update for that predicate register. 

The updates of the CFM are done speculatively in the Register-Files- Access stage 
in order not to delay the execution of the instructions in subsequent packets. However, 
the original value of the CFM is carried down the pipeline together with the packet that 
modified it. Should that packet be squashed in a later pipeline stage, then its modifica- 
tion of the CFM should not have been done and the original CFM value is restored. 

Accounting for all control bits that affect the updating of user- visible state and for 
all possible forwarding and stalling conditions, an exhaustive binary simulation need 
consider 2 ^^^ sequences of 5 VLIW instruction packets each, based on the Burch and 
Dill flushing technique. This number increases significantly when also considering 
possible mispredictions or correct predictions of branches, and single-Zmulti- cycle 
latencies of the multicycle functional units and memories. 

4 Discussion 

The above VLIW architecture imitates the Intel Itanium [9] [17] in that it has the same 
numbers and types of execution pipelines, as well as features such as predicated execu- 
tion, register remapping, advanced and speculative loads, and branch prediction. A fur- 
ther similarity is that the Instruction Memory, the two Floating-Point ALUs, and the 
Data Memory can each take single or multiple cycles in order to produce their results. 
The processor also has the same register files as the Itanium, as well as the same capa- 
bilities to move data between them. However, two differences should be mentioned. 

First, the Register-Files-Access stage is distributed across 3 pipeline stages in the 
Intel Itanium [9]. While these 3 stages are very simple, their modeling as separate 
stages resulted in an order of magnitude increase in the CPU time for the formal verifi- 
cation. This was due to an increase in the number of Boolean variables encoding 
equalities (see Sect. 2) between the then much larger numbers of source and destina- 
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tion register identifiers for accessing the Integer and Floating-Point Register Files. 
However, the single Register-Files-Access stage can be viewed as an unpipelined 
implementation of these 3 stages in the Itanium. It will be the focus of our future 
research to prove the correctness of a superpipelining transformation that splits that 
single stage into 3 stages. Alternatively, one can prove the correctness of unpipelining 
the 3 stages into a single stage. Unpipelining has been previously used for merging 
stages in a pipelined processor [13] in order to reduce the formal verification complex- 
ity, although that was possible only after extensive manual intervention for defining 
induction hypotheses correlating the signals in the two design versions. 

Second, the Itanium Fetch Engine also occupies 3 pipeline stages, the last of 
which dispatches instructions to the 9 functional units. However, the Fetch stage of the 
architecture in Fig. 1 can be viewed as an abstraction of a 3-stage Fetch Engine in that 
it has the same communication mechanism with the Execution Engine (the last 4 
stages in Eig. 1) as the Eetch Engine in the Itanium. Namely, the Eetch stage will pro- 
vide the Execution Engine with the same group of inputs on the next clock cycle if it is 
stalled by the Execution Engine in the present clock cycle. The refinement of the single 
Eetch stage into a detailed model of the Itanium Eetch Engine will be the focus of our 
future research. 

On the other hand, the VLIW architecture verified is comparable fo, if not more 
complex than, the StarCore [10] by Motorola and Lucent. The StarCore is a VLIW 
processor that also has a 5-stage pipeline, consisting of exactly the same stages. It sup- 
ports predicated execution. However, it has 6 instructions per VLIW packet, does not 
implement register remapping, does not support floating-point computations (so that it 
does not have a Eloating-Point Register Eile), does not execute advanced and specula- 
tive loads, and does not have branch prediction. 

5 Exploiting Conservative Approximations 

The CEM and the ALAT were abstracted with finite state machines (ESMs), similar to 
the Data Memory (see Sect. 2). In the case of the CEM, shown in Eig. 2, the first 
instruction in a packet modihes the present state via an UE that depends on the CEM 
present state and the Opcode of the instruction. Each subsequent instruction similarly 
modifies fhe lafesf CEM sfafe obtained after the updates by the preceding instructions 
in the packet. The hnal modified CEM state is written to the CEM latch only if the 
instruction packet is valid. In the case of the ALAT, each valid load or store instruction 
modifies fhe presenf state. The modification is done by an UE that depends on the 
present state of the ALAT, the instruction Opcode, and the address for the Data Mem- 
ory access. In this way, an arbitrary update is modeled, including the actual update that 
takes place only if the Opcode designates an advanced/speculative load or a store 
instruction. The special check instructions, that check whether the address of an 
advanced/speculative load has been evicted from the ALAT, are modeled with an UP 
that depends on the present ALAT state, the instruction Opcode, and the checked 
address. That UP produces a Boolean signal indicating whether to take a branch to an 
address provided in the instruction encoding in order to redo the speculative computa- 
tions done with possibly incorrect data. Because the UEs used in the abstraction of the 
CEM and the ALAT do not model any actual properties of these units, the ESM models 




Formal Verification of VLIW Microprocessors with Speculative Execution 303 



of the CFM and the ALAT are conservative approximations (see Proposition 1). Note 
that the ALAT is part of the user- visible state and is used in the non-pipelined specifi- 
cation as it provides support for static compile-time speculation — advanced and specu- 
lative loads — as opposed to dynamic run-time speculation that is done only in the 
implementation processor. 




Fig. 2. Abstraction of the CFM. Uninterpreted functions/^g^^pU, abstract the func- 

tional units that remap register ids provided in an instruction encoding into new register ids to be 
used for accessing a register file. The updates of the CFM are abstracted with UF/g^^. The orig- 
inal value CFMO is restored if the packet is squashed in a later stage. The remapping and updat- 
ing UFs are shown for only one instruction. Register remapping is a compile-time optimization. 

The Predicate Register File was also abstracted with an FSM, as shown in Fig. 3. 
The latest updated state of the entire Predicate Register File is available in the Execu- 
tion stage, because the VLIW architecture is defined to be able to transfer the entire 
contents of the Predicate Register File to a destination in the Integer Register File. 




Fig. 3. Abstraction of the Predicate Register File (PRF). Uninterpreted function abstracts 
the updating of the PRF in both the FSM model of this register file and its forwarding logic. 
Each predicate result in flight gets reflected on the latest PRF state by one application of fp^-. 
Uninterpreted predicates PyaM\^ Pvalid9 abstract the testing of a qualifying predicate register 
for being 1. 
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Each level of forwarding logic for the Predicate Register File is abstracted with an 
application of the same UF that is used to update the state of that register file in its 
FSM abstraction. The value of a qualifying predicate register identifier is tested for 
being true/false by means of an UP that depends on the latest Predicate Register File 
state and the qualifying predicate register identifier. Thus, an arbitrary predicate is 
modeled, including the actual multiplexor selecting the predicate bit at the location 
specified by the qualifying predicate register identifier. The instruction Opcode is 
included as an extra input to that UP in order to achieve partial functional non-consis- 
tency [18] of the UP across different instructions. Including extra inputs is a conserva- 
tive approximation, because if the processor is verified with the more general UP (UF), 
it would be correct for any implementation of that UP (UF), including the original one 
where the output does not depend on the extra inputs. 

The Branch-Address Register File was abstracted automatically. Originally, it was 
implemented as a regular memory in the Register-Files- Access stage with regular for- 
warding logic in the Execution stage. Then, the following transformations were 
applied automatically by the evaluation tool on the EUFM correctness formula, start- 
ing from the leaves of the formula: 

read{m, a) — ^ ( 2 ) 

write(m, a, d) —> fjjn, a, d) (3) 

ITE{e A {ar = aw), d,f^{m, ar)) — ^ f^{ITE{e,fJ_m, aw, d), m), ar) (4) 

Transformations (2) and (3) are the same as those used in the abstraction of the Data 
Memory, described in Sect. 2. Transformation (4) occurs in the cases when one level of 
forwarding logic is used to update the data read from address ar of the previous state m 
for the memory, where function read is already abstracted with UF /^. Accounting for 
the forwarding property of the memory semantics that was satisfied before function 
read was abstracted with UF /^, the left handside of (4) is equivalent to a read from 
address ar of the state of memory m after a write to address aw with data d is done 
under the condition that formula e is true. On the right handside of (4), functions read 
and write are again abstracted with/^ and/„ after accounting for the forwarding prop- 
erty. Multiple levels of forwarding are abstracted by recursive applications of (4), start- 
ing from the leaves of the correctness formula. Uninterpreted functions /„ and/^ can be 
automatically made unique for every memory, where a memory is identified by a 
unique domain variable serving as the memory argument at the leaves of a term that 
represents memory state. Hence, and will no longer be functionally consistent 
across memories — yet another conservative approximation. 

After all memories are automatically abstracted as presented above, the tool 
checks if an address term for an abstracted memory is used in a negated equation out- 
side the abstracted memories, i.e., is a g-term. If so, then the abstractions for all such 
memories are undone. Hence, abstraction is performed automatically only for a mem- 
ory whose addresses are p-terms outside the memory and the forwarding logic for it. 
From Proposition 1, it follows that such an abstraction is a conservative approxima- 
tion. The fact that address terms of abstracted memories are used only as p-terms out- 
side the abstracted memories avoids false negatives that might result when a (negated) 
equation of two address terms will imply that a write to one of the addresses will (not) 
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affect a read from the other address in the equation when that read is performed later — 
a property that is lost in the abstraction with UFs. In the examined architecture, the 
Integer and Floating-Point Register Files end up having g-term addresses after abstrac- 
tion because of the stalling logic that enforces a load interlock avoiding the data hazard 
when a load provides data for an immediately following dependent instruction. Hence, 
only the Branch-Address Register File, that is not affected by load interlocks, is 
abstracted automatically. The Data Memory was abstracted manually with an FSM in 
order to define its UFs for reading and updating to have the Opcode as an extra input in 
order to model byte-level memory accesses, as well as to achieve partial functional 
non-consistency. However, if the Opcode is not used as an extra input and the Data 
Memory is defined as a regular memory, then the algorithm will automatically achieve 
conservative approximation of the Data Memory by applying transformations (2) and 
(3) only. 

Note that the abstraction of the Branch-Address and Predicate Register Files 
helps avoid the introduction of Boolean variables that would have been required 
otherwise in order to encode the equality comparisons of branch-address and predicate 
register identifiers, respectively, as would have been needed by unabstracted memories 
and forwarding logic. 

An UF was used as a “translation box” for the new PC value before it is written to 
the PC latch, similar to our previous work [20]. That UF models an arbitrary modifica- 
tion of the new PC values. Therefore, this transformation is a conservative approxima- 
tion in that if a processor is verified with a translation box, it would be correct for any 
implementation of that UF, including the identity function that simply connects the 
input to the output — the case before inserting the translation box. However, that UF 
results in common subexpression substitution, reducing the complexity of the PC val- 
ues, and hence of the final equality comparisons for the state of the PC as used in the 
correctness formula. 

6 Decomposing the Computation of the Correctness Criterion 

The problem that was encountered in formally verifying variants of the VLIW archi- 
tecture presented in Sect. 3 is that the monolithic evaluation of the propositional for- 
mula obtained from (1) does not scale well with increasing the design complexity. The 
solution is to propose a framework for decomposing the monolithic evaluation into a 
set of simpler evaluations, each of which depends on a subset of the propositional for- 
mulas obtained after the translation of the EUFM formulas in (1) to proposi- 
tional logic. Furthermore, these simpler evaluations are not dependent on each other, 
so that they can be performed in parallel. Some terminology first. 

A literal is an instance of a Boolean variable or its complement. Any Boolean 
function that depends on n Boolean variables can be expresses as a sum of products 
(disjunction of conjuncts) of n literals, called the minterms of the function. Equiva- 
lently, a Boolean function can be interpreted as the set of its minterms. Then, the dis- 
junction and the conjunction of two Boolean functions are the union and intersection 
of their minterm sets, respectively. Stating that a <^b, where a and b are two Boolean 
functions, is equivalent to say that the set of minterms of b includes the minterms of a, 
while a <^b means that b includes extra minterms in addition to the minterms of a. We 




306 M.N. Velev 



will say that the k+\ Boolean functions, b^, b\, bj^,form a base, if they cover the 
entire Boolean space, i.e., fog v foj v ... \/ b^ = true, and are pair-wise disjoint, i.e., 
bi A bj = false for all i i, j < k. We will refer to functions bj, 0< i< k, that form 

a base as base functions. The definition of base functions is identical to that of disjoint 
window functions in Partitioned-ROBDDs [14]. 

Proposition 2. If the Boolean functions bQ, fej, ..., bp. form a base, and 
b^AUQ V A flj V ... V bjAai v ... v bpAap = true, where a/, 0<i<k, are Bool- 
ean functions, then bf c ajfor all i, 0< i< k. 

Proof. The proposition is proved by contradiction. Let bj ^ Oy for some j, 0 <j < k. 
Then aj does not contain at least one minterm of bj, so that bj a aj a bj since the con- 
junction of two Boolean functions can be interpreted as the intersection of their min- 
term sets. Furthermore, since bi a ai c bi for i je j, 0 < i < k, it follows that 
A flQ V A V ... V bjA Oj V ... V bpAap c bQ y bi y ... v bjAOj v ... v bp 

c bQ y bi V ... V bj y ...y bp = true. Hence, a aq v b-^Aa^ v ... v fo^ Aa,- v 
... V bpAap^ true, which is a contradiction to the assumption in the proposition. □ 

Proposition 3. If the Boolean functions bQ, bi, ..., bpform a base, and bQ v ... v b^.y 

V ai V bi^y V ... y bp = true, 0<i<k, {i.e., replacing one by with a Boolean function 
ai does not change the value of the disjunction) then by c a,-. 

Proof. The proposition is proved by contradiction. Let bj ^ a,-. Then a,- does not contain 
at least one minterm of bj and since that minterm is not contained in any other bj, j i, 
0<j<k,as the base functions are pair-wise disjoint, it follows that bQ v ... v bj.y v a,- 

V bj^y V ... y bp bQ y ... v bj.y v bj v bjj^y v ... y bp = true. Therefore, 

bQ V ... V bj_y V aj V bj^y v ... y bp ^ true, which is a contradiction to the assump- 
tion in the proposition. □ 

Proposition 4. If the Boolean functions bQ, by, ..., bpform a base, and bj c aj,for some 
i, 0< i <k, where aj is a Boolean function, then bQ v ... v bj_y v bj a aj v bj.^.y v ... 
y bp = true. 

Proof. If bj c aj then bj a aj = bj. Therefore, bQ v ... v bj_y v bj a Oj v bj^y v ... v bp 
= bQy ... V bj_y V bj v bj^y v ... y bp = true. □ 

Propositions 2-4 allow us to decompose the evaluation of the propositional for- 
mula for the correctness criterion. First, we have to find a set of Boolean functions bQ, 
by, ..., bp that form a base, where each bj is selected as the conjunction of a subset of 
the Boolean functions /j / 2 ,-, ...,fnb which are conjuncted together in the proposi- 
tional translation of (1). Then, for each^- j, I <j <n, not used in forming bj, we have to 
prove that its set of minterms is a superset of the minterms of bj by using either Propo- 
sition 2 or Proposition 3. Then, by Proposition 4, we can compose the results, proving 
that the monolithic Boolean formula for the processor correctness evaluates to true 
without actually evaluating it. 

As an example, let’s consider a pipelined processor that can fetch either 0 or 1 
new instructions on every clock cycle and that has 3 user-visible state elements — PC, 
Data Memory (DM), and Register File (RF). In order for the processor to be correct. 
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we will have to prove that a Boolean formula of the form PCq a RFq a DMq v 
PC\ A RFi A DM] evaluates to true. However, instead of evaluating the monolithic 
Boolean formula, we can prove that PCq and PC\ form a base, i.e., PCq v PC\ = true 
and PCq a PC\ = false. Additionally, we can prove that, for example: 

1. PCq a DMq V PC\ A DM] = true, which implies that PCq c DMq and 
PC] c DM] by Proposition 2; 

I.RFq V PC] = true, which implies that PCq c RFq by Proposition 3; and, 
3. PCq V PP] = true, which implies that PC] c PP] by Proposition 3. 

Then, by Proposition 4, it follows that: 

PCq a RFq a DMq V PC] A PP] A DM] = true. 

If computing resources are available, we can run a redundant set of computations 
in parallel, i.e., using multiple sets of Boolean functions as bases, and exploiting both 
propositions 2 and 3. Note that Proposition 2 can be applied in a version where only 
one of the base functions fc,-, 0 < i < A:, is conjuncted with a Boolean function a,- by set- 
ting Oj = true for all j i,0<j < k. Also note that both propositions 2 and 3 hold when 
a,-, 0 < i < A:, is the conjunction of several Boolean functions. Furthermore, we can use 
both BDDs and SAT-checkers. All computations that are still running can be stopped 
as soon as enough containment properties have been proved, so that they can be com- 
posed by Proposition 4 in order to imply that the monolithic Boolean formula for the 
correctness criterion will evaluate to true. Additionally, the simpler Boolean computa- 
tions will speed up the generation of counterexamples for buggy designs. 

Note that the proposed decomposition of the Boolean evaluation of the correct- 
ness formula does not require much expertise and can be done automatically. Namely, 
the Boolean functions that form a base can be selected automatically by choosing such 
flj from the propositional translation of (1) that have the smallest number of Boolean 
variables before their BDD is built or, alternatively, the smallest number of nodes in 
their EUFM representation. The base functions serve as automatic case-splitting 
expressions. In his work, Burch had to manually identify 28 case-splitting expressions 
[6]. He also had to manually decompose the commutative diagram for the correctness 
criterion into three diagrams. That decomposition required the intervention of an 
expert user, and was sufficiently subtle to warrant publication of its correctness proof 
as a separate paper [21]. 

7 Modeling Multicycle Functional Units and Branch Prediction 

Multicycle functional units are abstracted with “place holders” [20], where an UF 
abstracts the function of the unit, while a new Boolean variable is produced by a gener- 
ator of arbitrary values on every clock cycle in order to express the completion of the 
multicycle computation in the present cycle. Under this scheme, however, a multicycle 
computation will never finish under the assignments where the Boolean variables 
expressing completion evaluate to false. This is avoided by forcing the completion sig- 
nal to true during the flushing of the processor, based on the observation [6] that the 
logic of the processor can be modified during flushing as all it does is completing the 
instructions in flight. A mistake in this modification can only result in a false negative. 
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The correct semantics of the multicycle functional unit is modeled in the non-pipelined 
specification only by the UF used to abstract the functionality. The place holder for a 
multicycle memory is defined similarly, except that a regular memory or an FSM 
abstraction of a memory is used instead of the UF. 

The Branch Predictor in the implementation processor is abstracted with a gener- 
ator of arbitrary values [20], producing arbitrary predictions for both the taken/not- 
taken direction (represented with a new Boolean variable) and the target (a new 
domain variable) of a branch. What is verified is that if the implementation processor 
updates speculatively the PC according to a branch prediction made in an early stage 
of the pipeline and the prediction is incorrect as determined when the actual direction 
and target become available, then the processor has mechanisms to correct the mispre- 
diction. The non-pipelined specification does not include a Branch Predictor, which is 
not part of the user-visible state and is irrelevant for defining the correct instruction 
semantics. Note that if an implementation processor is verified wifh completely arbi- 
trary predictions for the direction and target of a branch, that processor will be correct 
for any actual implementation of the Branch Predictor. 

8 Experimental Results 

The base benchmark, 9xVLIW, implements all of the features of the VLIW architec- 
ture presented in Sect. 3 except for branch prediction and multicycle functional units. 
Conservative approximations were used for modeling the Predicate Register File, the 
CFM, and the ALAT (see Sect. 5). The Branch-Address Register File was abstracted 
automatically, as described in Sect. 5. In order to measure the benefit of using conser- 
vative approximations, 9xVLIW was formally verified wifh unabstracted Branch- 
Address Register File that was left as a regular memory with regular forwarding 
logic — 9xVLIW.uBARF. The base benchmark was extended with branch prediction 
in order to create 9xVLIW-BP, which was then extended with multicycle functional 
units — the Instruction Memory, the 2 floating-point ALUs, and the Data Memory — in 
order to build 9xVLIW-BP-MC. These models were verified against the same non- 
pipelined VLIW specification, whose semantics was defined in terms of VLIW 
instruction packets. Because of that, there was no need to impose constraints that the 
instructions in a packet do not have data dependencies between each other. 

The VLIW implementation processors were described in 3,200 - 3,700 lines of 
our HDL that supports the constructs of the logic of EUFM, while the non-pipelined 
VLIW specification processor was described in 730 lines of the same HDL. This HDL 
is very similar to Verilog, so that the abstract description of the implementation proces- 
sor can be viewed as a level in a hierarchical Verilog description of the processor. At 
that level, functional units and memories are left as modules (black boxes), while the 
control logic that glues these modules together is described entirely in terms of logic 
gates, equality comparators, and multiplexors (the constructs of EUFM). Then, based 
on information provided in the description and identifying whether a module is either a 
single-cycle or multicycle functional unit or memory, each of the modules can be auto- 
matically replaced by an UF, or an abstract memory, or a place holder for a multicycle 
functional unit or memory. Similarly, modules like the ALAT (see Sect. 5) can be auto- 
matically replaced with an FSM-based abstraction. Hence, the abstract description of 
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the implementation processor can be generated automatically. 

The results are presented in Table 1. The experiments were performed on a 336 
MHz Sun4 with 1.2 GB of memory. The Colorado University BDD package [7] and 
the sifting BDD variable reordering heuristic [15] were used to evaluate the final prop- 
ositional formulas. Burch’s controlled flushing [6] was employed for all of the designs. 



Processor 


Correctness 
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BDD Variahles 


Max. 

BDD 
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Memory 
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CPU 

Time 
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9xVLIW.uBARF 


monolithic 


1,816 


236 


2,052 


4.8 
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13.68 


2.6 
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1,176 
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2.9 
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5.28 
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1,364 


3.3 
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1.3 
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2.54 
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1,384 
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monolithic 
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2,615 


6.8 
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31.57 


3.97 




decomposed 


sufficient 


1,469 


214 


1,683 




74 


7.96 








max. 


1,469 


214 


1,683 


4.2 


97 







Table 1. BDD-based Boolean evaluation of the correctness formula. The BDD and memory 
statistics for the decomposed computations are reported as: “sufficient” — the maximum in each 
category across the computations that were sufficient to prove that the monolithic Boolean 
formula is a tautology; “max.” — the maximum in each category across all computations run. The 
CPU time reported for the decomposed computations is the minimum that was required for 
proving a sufficient set of Boolean containment properties that would imply the monolithic 
Boolean correctness formula is a tautology. 

Decomposition of the Boolean evaluation of the correctness formula accelerated 
almost 4 times the verification of the most complex benchmark, 9xVLIW-BP-MC. 
Furthermore, the CPU time ratio would have been much higher than 3.97 if it were cal- 
culated relative to the CPU time for the monolithic verification of a version of that pro- 
cessor where conservative approximations are not used to abstract the Branch- Address 
and Predicate Register Files and their forwarding logic. Indeed, the CPU time ratio for 
the monolithic verification of 9xVLIW.uBARF (where only the Branch- Address Reg- 
ister File is not abstracted) vs. the CPU time for the decomposed verification of 
9xVLIW is 7.86. Note that with faster new computers, the CPU times for the formal 
verification will decrease, while the CPU time ratio of monolithic vs. decomposed 
computation can be expected to stay relatively constant for the same benchmarks. 

Although decomposition always reduced the number of BDD variables — with up 
to 35%, compared to the monolithic evaluation — some of the simpler computations 
required almost 5 times as many BDD nodes and more than 3 times as much memory 
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as the monolithic computation. This was due to differences in the structures of the 
evaluated Boolean formulas, some of which proved to be difficult for the sifting 
dynamic BDD variable reordering heuristic. However, the computations that were suf- 
ficient to prove that the monolithic formula is a tautology always required fewer 
resources than the monolithic computation. 

The examined processors had 8 user-visible state elements: PC; 4 register files — 
Integer, Floating-Point, Predicate, and Branch-Address; CFM; ALAT; and Data Mem- 
ory. The Boolean formulas that form a base were selected to be the ones from the 
equality comparisons for the state of the CFM, which has a relatively simple updating 
logic (see Sect. 5). Proving that these functions form a base took between 6 seconds in 
the case of 9xVLIW and 54 seconds in the case of 9xVLIW-BP-MC, while the number 
of BDD nodes varied between 10,622 and 52,372, and the number of BDD variables 
was between 255 and 411. 

The relatively simple Boolean formulas formed from the equality comparisons 
for the state of the ALAT were used in order to perturb the structure of the evaluated 
Boolean formulas. Specifically, instead of proving only that CFM, c fi for some i (i.e., 
the base function CFM,- is contained in^-), where is the Boolean formula formed from 
the equality comparison for the state of some user-visible state element, a simulta- 
neous proof was run for CFM,- c ^ a ALAF,-. This strategy resulted in additional Bool- 
ean formulas with slightly different structure, without a significant increase in the 
number of Boolean variables. Note that the number of Boolean variables in a formula 
is not indicative of the time that it would take to build the BDD for the formula. Some- 
times the structural variations helped the sifting heuristic for dynamic BDD variable 
reordering to speed up the BDD-based evaluation. Applied only for Proposition 2, this 
strategy reduced the formal verification time for 9xVLIW-BP-MC with around half an 
hour — from 8.44 hours, that would have been required otherwise, to 7.96 hours. 

Decompositions based on Proposition 2 usually required less time to prove the 
same containment properties, compared to decompositions with Proposition 3. How- 
ever, Proposition 2 alone would have required 9.64 hours in order to prove a sufficient 
set of containment properties for the most complex model, 9xVLIW-BP-MC. Given 
sufficient computing resources, a winning approach will be to run a large set of com- 
putations in parallel, exploiting different structural variations in the Boolean formulas 
in order to achieve a performance gain for the BDD variable reordering heuristic. 

9 Conclusions 

A VLIW microprocessor was formally verified. Its Execution Engine imitates that of 
the Intel Itanium [9] [16], while its Fetch Engine is simpler. The modeled features are 
comparable to, if not more complex than, those of the StarCore [10] microprocessor by 
Motorola and Lucent. Efficient formal verification was possible after an extensive use 
of conservative approximations — some of them applied automatically — in defining the 
abstract implementation and specification processors, in addition to exploiting a 
decomposed Boolean evaluation of the correctness formula. 
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Abstract. This paper describes a technique of inductive proof based on 
model checking. It differs from previous techniques that combine induc- 
tion and model checking in that the proof is fully mechanically checked 
and temporal variables (process identifiers, for example) may be natural 
numbers. To prove \/n.ip{n) inductively, the predicate ip{n — 1) 
must be proved for all values of the parameter n. Its proof for a fixed n 
uses a conservative abstraction that partitions the natural numbers into 
a finite number of intervals. This renders the model finite. Further, the 
abstractions for different values of n fall into a hnite number of isomor- 
phism classes. Thus, an inductive proof of \/n.ip{n) can be obtained by 
checking a finite number of formulas on hnite models. The method is 
integrated with a compositional proof system based on the SMV model 
checker. It is illustrated by examples, including the A-process “bakery” 
mutual exclusion algorithm. 



1 Introduction 

In verifying concurrent or reactive systems, we are often called upon to reason 
about ordered sets. For example, a packet router may be required to deliver a 
sequence of packets in order, or a set of processes may be ordered by a linear or 
grid topology. In such cases, it is convenient to reason by induction. For example, 
we show that if packet f — 1 is delivered correctly, then so is packet i, or that 
if a ring arbiter with n — 1 processes is live, then so is a ring with n processes. 
Indeed, inductive proof may be necessary if the ordered set is unbounded. 

For an ordered set of finite state processes, it is natural to consider using 
model checking to prove the inductive step. This was proposed a decade ago by 
Kurshan and McMillan [KM89] and by Wolper and Lovinfosse [WL89]. They 
considered a class of finite-state processes constructed inductively, and an in- 
ductive invariant over this class also expressed as a finite-state process. To prove 
the invariant, it suffices to check the inductive step of the proof for a finite 
number of instances of the induction parameter. This task can be carried out by 
finite-state methods. However, no mechanical method was proposed to check the 
validity of this “meta-proof” , or to automatically generate the required instances 
of the induction step. Further the technique was limited to finite state invari- 
ants. This limitation can be observed, for example, in the work of Henzinger, 
Qadeer and Rajamani [HQR99], who constructed an inductive proof of a cache 
coherence protocol. The method cannot be applied to protocols that exchange 
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process identifiers (a very common case in practice) because these are not “finite 
state” . 

Here, we extend induction via model checking beyond finite state invariants, 
to problems that have not generally been considered amenable to solution by 
model checking. Our proof method uses a general induction scheme that allows 
mutual induction over multiple induction parameters. Models for appropriate 
instances of the induction parameters are generated automatically, and all proof 
steps are mechanically checked. We illustrate these advantages by examples, 
including a proof of safety and liveness of a version of the fV-process “bakery” 
mutual exclusion algorithm [Lam74]. 

Our technique has been integrated into the SMV proof assistant, a proof 
system based on a first order temporal logic [McM97,McM98,McM99].^ Both the 
system to be verified and the specification are expressed in temporal logic, though 
with a great deal of “syntactic sugar” . Inductive proofs are reduced to finite state 
subgoals in the following way. To prove a predicate Vn.(^(n) inductively, we need 
to prove if{n — 1) <f{n) for all values of the parameter n (i^(— 1) is defined to 

be true). In general, (p{n) may also refer to some fixed constants such as zero, the 
number of processes, sizes of arrays, etc. To make the problem finite-state for a 
particular value of n, we abstract the natural numbers to a finite set of intervals. 
Typically, the values n — 1, n, and the fixed constants of interest are represented 
by singleton intervals, while each interval between these values becomes a single 
abstract value. We observe that the abstractions for values of n satisfying the 
same inequality relationships over the given terms {e.g., 0 < n — 1 < n) are 
isomorphic. Thus it is sufficient to enumerate the feasible inequality relationships 
(which we call “inequality classes” ) and verify one representative value of n from 
each. Thus, we reduce the problem to a finite number of finite-state subgoals. 

Related work This work builds on the techniques developed by McMillan 
[McM98,McM99] — temporal case splitting, reduction of cases by symmetry and 
data type reduction — to reduce proof goals to a finite collection of finite-state 
subgoals that can be discharged by model checking. The idea of using model 
checking to prove an inductive step has also been used by Raj an, Shankar and 
Srivas in the PVS system [RSS95] . They embedded the mu-calculus in PVS, using 
model checking to verify formulas on finite abstract models. These abstractions, 
however, were constructed manually, and their soundness was proved with user 
assistance. Here, the abstractions are constructed automatically and are correct 
by construction. The user specifies at most a set of constants to be used in an ab- 
stract type, although often a suitable set can be inferred automatically. Another 
approach to model checking within a theorem prover is predicate abstraction, 
in which the state of the abstract model is defined by the truth values of a set 
of predicates on the concrete model {e.g., [SG97]). Once suitable predicates are 
chosen, decision procedures can construct the abstract model. However, a sui- 
table set of predicates must still be chosen, manually or heuristically. Saidi and 
Shankar used such a method in PVS [ORS92] to verify a two-process mutual 
exclusion algorithm, similar to our Wprocess example [SS99]. Only safety (z.e., 

^ SMV can be obtained from http://www-cad.eecs.berkeley.edu/~kenmcmil/smv. 
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mutual exclusion) was proved however, and the two-process case does not require 
induction (except over time). Bensalem, Lakhnech and Owre also report an ab- 
straction technique in PVS that has been used to prove safety in the two-process 
case [BL098]. Das, Dill and Park [DDP99] have extended predicate abstraction 
to deal with parameterized systems by using predicates that involve quantifiers, 
but they have also considered only safety proofs, not requiring induction. Here, 
we prove A^-process liveness, by induction over a lexical order. We note further 
that unlike predicate abstraction, which in general requires a high-complexity 
decision procedure, the present method generates abstract models in linear time. 



2 Type Reductions for Ordered Sets 



We now consider the abstraction technique used in SMV to support induction. 
We begin with the notion of an ordered set type, or ordset. This is a data 
type isomorphic to the natural numbers {0,1,...}. We would like to prove a 
proposition of the form Vi. p{i), where p is a temporal property and i ranges 
over an ordset type T. There are two complications in proving this fact using 
model checking. First, p may refer to variables of infinite type T, hence the state 
space is not finite. Second, p(i) must be proved for infinitely many values of i. 
An abstraction of the type T to a finite set can solve both of these problems, in 
that it makes the state space finite, and also reduces the values of i to a finite 
number of equivalence classes. 

As observed in [McM98] , if values of type T were used in a symmetric way, 
then it would suffice to check only one fixed value of i. However, this rules 
out expressions such as a; -I- 1 or a: < y, where x and y are of type T. Since 
precisely such expressions are needed for proofs by induction over T, we cannot 
rely on this kind of symmetry argument. We can, however, induce a symmetry by 
abstracting the type T relative to the value of i. Suppose we fix a value of i. We 
may then abstract the type T to a set of three values: i itself, an abstract symbol 
representing the semi-closed interval [0,z), and an abstract symbol representing 
the open interval (z, oo). Using a suitable abstract interpretation of each operator 
in the logic, we obtain a conservative abstraction: if p{i) is true in the abstract 
(finite) model, then it is true in the concrete (infinite) model. 

Abstract operators The abstract operators operate on subsets of the ab- 
stract type. We use T to refer to the set of all abstract values. As an exam- 
ple, consider the successor operation on x, denoted a; -I- 1, where x is of type 
T. The successor of a number in the interval [0, z) might be another value in 
[0,z), or it might be z, since [0,z) contains z — 1. The abstraction does not 
provide enough information to determine which is the answer. Thus, we say 
{[0, z)| -I- 1 = {[0, z), z|. The successor of z is z -I- 1, which must be in the interval 
(z, oo). Thus,|z} -I- 1 = |(z, oo)}. Further, a: -I- 1 for any value of a; in (z, oo) is also 
in (z, oo). Thus, in the abstraction, |(z,oo)} -I- 1 = {(z,oo)}. When operating on 
non-singleton sets of abstract values, we simply define the operator so that it is 
linear with respect to set union. That is, for example, (a:Uz/)-|-l = (a;-|-l)U(z/-|-l). 
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Fig. 1. Abstraction of the natural numbers 
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Fig. 2. Truth tables for abstract comparison operations 



This abstract model of type T is depicted in figure 1. As we increment a 
value of type T, it stays in the interval [0, i) for some arbitrary amount of time, 
then shifts to i, then stays forever in the interval {i, oo). Note that this model is 
finite, and homomorphic to the natural numbers, where each number maps to 
itself, or the interval that contains it (in the extreme case z = 0, note that no 
value maps into the interval [0, z), but the homomorphism still holds). Note that 
some of the abstract operators are too abstract to be truly useful. For example, 
we abstract x + y to be simply _L, and any operator on mixed types yields _L. 

Now consider the operators x = y and x < y over type T. Figure 2 shows 
suitable abstract truth tables for these operators. For example, any value in [0, z) 
is less than i, so {[0,z)} < {z} is true. Therefore, we let {[0,z)} < {z} equal {!}. 
On the other hand, if both x and y are in [0, z), then x < y may be either true or 
false, depending on the choice of x and y. Thus, let {[0, z)} < {[0, z)} equal {0, 1}. 
By suitable abstract definitions of all the operators in the logic, we maintain a 
homomorphism from the concrete to the abstract model. Thus, if the abstract 
model satisfies p{i), then the concrete one does. 

Equivalence classes of abstractions This abstraction solves the problem 
of non-finiteness of the model, but it still leaves us with an infinite set of values 
of z to check. Note, however, that the abstraction of type T has been chosen so 
that the abstract models for all values of z are isomorphic. Thus, if we denote by 
Mi the abstract model for parameter value z, then for any two values x and y 
of type T, Mx satisfies p{x) exactly when My satisfies p{y). It therefore suffices 
to verify that Mi satisfies p{i) for just one fixed value of z to infer Vz. p{i). Of 
course, the abstract model may yield a false negative. However, we know at least 
that all values of z will give the same truth value. 

Now, suppose that we have more than one parameter to deal with. For exam- 
ple, we may want to prove Vz, j.p(z,y), where z and j range over type T. In this 
case, we introduce more than one concrete value into the abstract type. We do 
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Fig. 3. Abstraction with two parameters 



not, however, expect all the abstract models to be isomorphic. Rather, we must 
distinguish three cases: i < j, i = j and j < i. In the first case, for example, we 
can abstract the type T to the set 

{[0,i),i, (j.oo)}. 

The abstract model of type T would then be as pictured in figure 3. Note in 
particular that the value of f + 1 here can be either (i,j) or j. This is because 
the the interval (i,j) can be either empty or non-empty, and both cases must 
be accounted for. The case where j < i is the same, with the roles of i and j 
reversed, and the case i = j is the same as for the simpler problem above (figure 
1). Within each of these three classes, the abstract models are all isomorphic. 
This schema can be extended to any number of parameters. However, the number 
of equivalence classes increases exponentially with the number of parameters. In 
general, we must consider as a separate class each feasible theory of inequality 
over the given parameters. 

Introduction of fixed constants Generally, proofs by induction refer to 
one or more fixed constants. For example, the base case (typically 0) often occurs 
as a constant in the formula p, and other constants may occur as well, as for 
example the upper bound of an array. Note that in the above system, constants 
must be abstracted to T, otherwise the isomorphism between cases is lost. Ho- 
wever, it is possible to treat fixed constants in the same way as parameters in 
the abstract model. The only difference is that certain inequality classes may be 
vacuous. For example, suppose that we wish to prove Vf. p{i), and the formula 
p{i) contains the constant 0. We have three classes to consider here: i = 0, i < 0 
and 0 < t. However, since i ranges over the natural numbers, the case i < 0 is 
infeasible, thus we have only two feasible cases to check. Note that the number 
of values in the abstract type may sometimes be reduced when fixed constants 
are involved. For example, for the case where 0 < i, we could abstract the type 
T to the set {[0, 0), 0, (0, i), i, (f, oo)}. Since the interval [0, 0) is empty, it can be 
dropped from the abstraction. 

In a proof by induction over the parameter i, it is clearly important to be 
able to refer to the value i — 1. That is, the formula p{i) that we wish to prove 
is typically of the form g(f — 1) q{i). If the formula i — 1 yielded an abstract 

constant, the proof would be unlikely to succeed. However, we can also include 
terms of the form i -I- c, where c is a fixed value, in the abstraction without 
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Fig. 4. Abstract type for induction proof 



breaking the isomorphism. It is simply a matter of enumerating all the feasible 
inequality relations and eliminating the interval constants that are necessarily 
empty. For example, suppose we choose to an abstraction using the terms 0, i 
and i — 1. There are three feasible inequality relations among these terms: 

i — 1 < 0 = i 

0 = i — 1 < i 

0 < i — 1 < i 

In the case 0 < i — I < f, for example, the intervals [0,0) and {i — l,i) are 
necessarily empty. Thus, we abstract the type T to the set {0,(0, i — l),i — 
1, i, (i, oo)}, as shown in figure 4. To prove p{i), we need only choose one value 
of i satisfying each inequality relation (for example, i = 0, 1, 2) as all values 
satisfying a given relation yield isomorphic abstractions. 

Example To see how this abstraction technique can be used in an inductive 
proof, suppose we have a counter that starts at 0 and increments at each time 
step by one. This can be represented by the following temporal formula </>: 

(x = 0) A G{x' = X + 1). 

Here, assume x to be a variable of type T. We would like to prove that for all i 
in T, eventually x = i. That is, where q{i) is F(x = i). By induction, it 

is sufficient to prove Vf. p{i), where 

p{i) = {(j)Aq{i - 1)) q{i) 

(assuming we take g(— 1) to be equivalent to true). Using the above abstraction 
(containing the values 0, z — 1 and i) we have three values of i to check, each 
with a different abstraction of T. For example, the case z = 2 falls in the class 
0 < z — 1 < z (fig 4). Note that if x = z — 1, then in this abstraction x must 
be z at the next time. Thus, if F(x = z — 1), then F{x = z), which is exactly 
what we want to prove. Since this is true in the abstract model, it must be true 
in the concrete model. The reader might want to consider the other two cases 
(z = 0, 1), and confirm that our property holds under these abstractions as well. 
This suffices to prove p{i) for all z, since all of the remaining values of z yield 
abstractions isomorphic to the case z = 2. 
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Here is how this problem would be entered in the SMV system: 

init(a;) := 0; 
next(a;) := a; + 1; 

forall(z in T) { 

g[z] : assert F {x = z); 
using q[i-V\ prove g[z]; 

} 

The system automatically chooses the values 0,z — l,i for the abstraction of 
type T (though this can be chosen manually as well) . The three feasible equality 
relations among these constants are enumerated automatically, and one repre- 
sentative value of i from each relation is chosen. The induction step is then 
verified by model checking for each corresponding abstraction of type T (this is 
possible because all are finite). SMV then reports that the property q has been 
proved. How SMV recognizes that the induction itself is valid is the subject of 
the next section. 

3 Induction Principle 

While the above technique allows us to prove that q{i — 1) implies q{i), for all i, 
it does not tell us that this in fact implies q{i) for all i. For this we need an 
induction rule in the system. While precisely this rule could easily be added 
to the system, we will consider here a more general scheme, that allows mutual 
induction over multiple induction parameters. This will be illustrated in section 5 
using the “bakery” mutual exclusion algorithm. 

To present this, we must first consider how a proof is represented in the SMV 
system. A proof is a graph, in which the nodes are properties and an edge from 
property p to property q indicates that p is used to prove q. Proof edges are 
suggested to the prover by a statement like the following: 

using p prove q; 

The proof graph must be well founded, i.e., have no infinite backward paths. A 
parameterized property is considered to be equivalent to the set of its ground 
instances. Thus, for example, the property 

forall(z in T) q[i] : assert F(a; = i); 

is considered to be a shorthand for the infinite set of properties 

g[0] : assert F(a; = 0); g[l] : assert F(a; = 1); ... 

Similarly, a parameterized proof graph statement, such as 



forall(z in T) using q[i-V\ prove g[z]; 
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is considered to be equivalent to the infinite set of statements: 

using g[-l] prove g[0]; using g[0] prove g[l]; ... 

Note, g[— 1] is considered to be equivalent to “true”. In the case of finitely bo- 
unded parameters, SMV can simply construct the proof graph consisting of all 
of the ground instances and check that there are no cycles. However, for infinite 
types this is clearly not possible, and for large types it is impractical. Instead, 
an abstract proof graph is constructed, in which the nodes are parameterized 
properties and an arc exists between two such properties when there is an arc 
between any ground instances of the properties. This graph is homomorphic to 
the graph of ground instances, thus well-foundedness of the former is a sufficient 
(but not necessary) condition for well-foundedness of the latter. 

Unfortunately, this does not allow for inductive proofs. That is, the statement 

forall(z in T) using q[i-V\ prove g[z]; 

implies an edge from q[i] to itself in the abstract proof graph, and hence a cycle. 
We need an abstraction of the proof graph that preserves well foundedness of 
inductive proofs. We can obtain this based on the following observation: an 
infinite path in a graph either is a simple path (with no repeated nodes), or it 
contains a cycle. Thus, if we can prove that every node in the graph is neither on 
a cycle, nor the root of an infinite simple (backward) path, we know the graph 
is well founded. The key is that we can show this using a different abstraction 
for each node. 

That is, let G = (V,E) be the (possibly infinite) proof graph on the set 
ground instances U, let P = {Pi, . . . , F„} be a finite partition of V, where each 
Pi is the set of ground instances of property pi, and for each instance v &V, let 
Hv = (Vv,Ey) be a finite graph (the abstract proof graph for v), such that there 
is a homomorphism hy : V ^ Vy from G to Ely. 

Theorem 1. If there is an infinite backward path in G, then for some 1 < j < n 
and V G Pj, 

— hy{v) is on a cycle in Ely, or 

— there is a backward path in Hy from hy{v) to some w G Vy such that for 
infinitely many w' G Pj, hy{w') = w. 

Proof. Let cr = sqjSi,... be an infinite backward path in G. This path uses 
either a finite or an infinite subset of U. In the first case, let v be some repeated 
element of a. There is a cycle in G containing v, hence a cycle in Hy containing 
hy{v) (since hy is a homomorphism from G to Hy). In the second case, a must 
use an infinite number of elements of some Pj. Let v be some element of Pj on 
(7, and let a' be the tail of cr beginning with v. Since Vy is finite, it follows that 
hy maps an infinite number of elements w' G Pj of a' to some w G Vy. □ 

By the theorem, it suffices to check each ground instance of a property in one 
abstract proof graph, verifying that its image in the abstract graph is neither 
on a cycle, nor on a backward path to an node abstracting an infinite set of 
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Fig. 5. An abstract proof graph 



instances of the same property. In practice, we can use the same abstraction 
of the natural numbers to generate the abstract proof graph Hy that we use to 
verify instance v. This means that the abstract proof graphs within an inequality 
class are isomorphic, hence we need only check one representative of each class. 

Example Consider our proof from the previous section. There are three cases 
of feasible inequality relations to consider. In the case 0 < z — I < z, we abstract 
type T to the set {0, (0, z — 1), z — 1, z, (z, oo)}. Substituting these abstract values 
into the proof graph statement, we obtain the abstract proof graph of figure 5. 
We must check two properties of this graph: 

1. The node q[i] is not on a cycle. 

2. No backward path rooted at q[i] reaches an instance of q containing an 
infinite interval. 

Since the abstract graphs are isomorphic for all values of z satisfying 0 < z — 1 < z, 
we need check only one instance in this class (say, z = 2). In fact, we can verify the 
above properties for one representative of each inequality class, so we conclude 
that the proof graph is acyclic. 

On the other hand, suppose we had written instead: 

forall(z in T) using g[z+l] prove g[z]; 

In the abstract proof graph, we would find a backward path q[i], q[i + 1], q[{i + 
l,oo)]. Since the last of these contains an infinite interval, we reject this graph. 
In fact, there is an infinite backward chain in the proof, so the proof is not valid. 

Using this technique, we can handle more general induction schema than 
the simple example above. For example, we can use mutual induction over two 
invariants: 

... using p[t\ prove g[z]; 

... using q[i-V\ prove p[z]; 

Or, we can use induction simultaneously over two parameters: 

... using p[z-l][j], p[z][j-l] prove 

(here, imagine a grid in which each cell is proved using its neighbors to the 
left and below). In section 5, we will see an example of induction over multiple 
parameters. 
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4 A FIFO Buffer 

As a simple example of proof by induction using model checking, we verify a 
hardware implementation of a FIFO buffer. We can decompose this problem 
into two parts. The first is to show that each data item output by the FIFO 
is correct, and the second is to show that the data items are output in the 
correct order. This separation is done by tagging each data item with an index 
number indicating its order at the input. Since only the ordering problem requires 
induction, we consider only that part of the proof here. 

The input inp and output out of our FIFO have three fields: valid, a boolean 
indicating valid data at the interface, data, the actual data item, and idx, the 
index number of the data item. To specify ordering at either interface, we simply 
introduce a counter cut that counts the number of items received or transmitted 
thus far. If there is valid data at the interface, we specify that the index of that 
data is equal to cut. This counter ranges over an ordset type called INDEX. 

cut : INDEX, 

init (cnt) := 0; 

\.i{valid) next (cnt) := cut + 1; 

ordered: assert G {valid => idx = cut)-. 

We create an instance of this specification for both input and output of the 
FIFO, and would like to prove that the property ordered at the input implies 
ordered at the output. 

Let us implement the FIFO as a circular buffer, with a large number of entries 
(say, 1024) so that it is too large to verify temporal properties of it directly by 
model checking. To solve this problem, we need to abstract the type of buffer 
addresses as well. Thus, we create another ordset type and declare the type of 
buffer addresses ADDR to be the subrange 0 . . . 1023 of this type. Note that 
there are two fixed constants of this type that appear in the code: 0 and 1023. 
For example, to increment the head pointer in the circular buffer, the following 
code is used: 

next(/iead) := {head = 1023) ? 0 : head + 1; 

Now, we can prove the ordered property at the output by induction on the 
value of cut. We prove that if the last item output was numbered i — 1, then the 
current one is numbered i. The natural abstraction needed to prove this would 
use the values 0, z — 1 and i (we need 0, since the initial value must be zero) . In 
fact, SMV chooses this abstraction by default. Note that if the item indexed i 
gets stored in address j of the circular buffer, then item z — 1 must have been 
stored at address j — 1 mod 1024. Thus, if we analyze cases on j, we only have to 
consider two addresses in the array to prove the inductive step. To do the case 
analysis in SMV, we say: 

forall(z in INDEX) forall(j in ADDR) 
subcase out.ordered[!\[f\ of out. ordered 
for out.cnt = z A tail = j; 
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The property ordered[i][i] says that, if the current cnt at the output is i, and 
the current value of the tail pointer (z.e., the address of the current output) is j, 
then the current index at the output must be cnt. Now, we use induction over i 
in the following way: 

forall(z in INDEX) forall(j in ADDR) 

using T_DD_R=i>{0,j-l,j,1023}, inp. ordered, out.ordered[i-l] 
prove out. ordered[i\ [j ] ; 

In the abstracted models, there are maximally four elements in the buffer, the 
remainder being abstracted to _L. With this abstraction, the buffer implemen- 
tation can be model checked without difficulty. Note, we are not actually using 
induction over the type ADDR here. We are simply using the ordered set ab- 
straction to make the model checking of a 1024 element buffer tractable. The 
use of this abstraction does increase the number of cases that must be checked, 
however. We must check three cases of i times five cases of j (the additional fixed 
constant of type ADDR increases the number of feasible ordering relations) for 
a total of fifteen. Nonetheless, all of these cases can be verified by SMV in less 
than two seconds. 

We can also prove our buffer implementation for arbitrary depth by using an 
uninterpreted constant to represent the buffer depth. This would be of secondary 
interest in hardware verification, however, where resources are fixed and finite. 

5 Bakery Mutual Exclusion Algorithm 

Leslie Lamport’s “bakery” algorithm [Lam74] is a mutual exclusion algorithm 
for a set of N processes, each running a program with two parts, a critical section 
and a noncritical section. We have adapted from Lamport’s original presentation 
with minor changes for the present exposition.^ For each process i, the algorithm 
uses a boolean variable choosing[i], natural number variables numher[i] and 
max[i], and a variable count[i] that ranges over process id’s 1, . . . , iV. We denote 
the type of natural numbers by NAT, and the type of process id’s by PID. These 
can all be read and written by process i. In addition, choosing[i], and number[i] 
are readable, but not writable by other processes. Each process i starts execution 
in its noncritical section with choo.sing[i] and number [i] initially equal to 0. The 
following pseudo-code (not SMV input) gives the program for process i: 

LI: {noncritical section; nondeterministically goto LI or L2;} 

L2: choosing[i] := 1; 

L3: {count[i] := 1; maa;[z] := 0;} 

L4: max[i] := maximum (maa;[z], number[count[i]]); 

L5: if count[i] < N then {count[i] := count[i]+l; goto L4;} 

^ The proof described here disregards two important features of the algorithm: the 
non-atomicity of reads and writes, and the possibility of crashes. SMV files for this 
version and the general algorithm can be found at 

http://www-cad.eecs .berkeley . edu/'kenmcmil/bakery .html. 
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L6: number[i] := max[i\+l] 

L7: choosing[i] := 0; 

L8: count[i] := 1; 

L9: if choosing[count[i]] = 1 then goto L9; 

LIO: if number[count[i]] ^ 0 and 

{number[count[i]] < number[i] or 
{number[count[i]] = number[i] and count[i] < i)) 

then goto LIO; 

Lll: if count[i] < N then {count[i] := count[i]+l; goto L9;} 

L12: critical section; 

L13: {number[i] := 0; goto LI;} 

In lines L3-L6, process i chooses a positive number number [i]', the ordered pair 
{number [i],i) serves as process i’s “ticket” to enter the critical section. In lines 
L8-LI1, process i loops over all processes j, waiting at L9 until process j is not 
in the act of choosing a ticket, and waiting at LIO until process j either does not 
have a ticket {number[j] = 0) or has a ticket that is greater (in lexicographic 
order) than process z’s ticket. 

The algorithm has two key properties. First, it ensures safety: no two pro- 
cesses can ever be in the critical section (at L12) at the same time. Second, it 
guarantees liveness under an assumption of fairness: if every process continues 
to execute instructions (the fairness assumption), then any process that reaches 
L2 eventually gets to the critical section (the liveness guarantee). The fairness 
assumption includes the assumption that a process in the critical section will 
eventually leave the critical section. However, as indicated by the pseudo-code 
“nondeterministically goto LI or L2”, a process may remain in the noncritical 
section forever (and this does not interfere with the liveness of any other process). 

To encode the bakery algorithm in SMV we introduce some additional varia- 
bles. For each process i the variable pc[i] {i’s “program counter”) ranges over the 
values {LI, L2 , . . . , L13} and designates the next line to be executed by process 
i. The variable act takes on an arbitrary value in the range 1, . . . , at each time 
step and designates the process that executes a line at that time step. Thus the 
action taken at any time step depends on the value of pc[act]: 

switch {pc[act]) { 

LI: { next(pc[oct]) := [LI, L2}; } 

L2: { next{choosing[act]) := 1; next(pc[oct]) := L3] } 

L13: { neiit{number[act\) := 0; next(pc[oct]) := L1-, } } 

The code also includes initial conditions for the processes (not shown here) and 
the fairness condition: 

forall {i in PLD) fair[i\: assert G(F(oct = z)); 

The safety and liveness properties of the algorithm can now be defined as follows: 




324 



K.L. McMillan, S. Qadeer, and J.B. Saxe 



forall (z in PID) forall (j in PID) { 

safe\i]\j\: assert G(z ^ j ^ “’(pcfzl = L12 A pclj] = L12)): 
live[i\: assert G{pc[i\ = L2 ^ F(pc[z] = L12))-, } 

In the limited remaining space, we concentrate on sketching the liveness 
proof, which is more interesting and difficult than the safety proof. The liveness 
property is easily proved using the following two lemmas, the first saying that 
any process that starts the loop in lines L3-L5 eventually completes it, and the 
second saying the same for the loop in lines L8-L11: 

forall (z in PID) { 

reaches-L6[{\: assert G{pc[i] = L3 ^ F(pc[z] = L6))-, 
reaches -L12[i\-. assert G(pc[z] = L8 ^ F(pc[z] = L12))] 
using reaches-L6['!\, reaches -L12 [t\, fair[i] prove live[i]; } 

Note that the proof of live[i] relies on the fairness assumption fair[i] to ensure 
that process z eventually gets from L2 to L3 and from L6 to L8. Of course, the 
assumption of fairness is also essential for the proofs of the lemmas reaches-L6 
and reaches-L12. 

To prove that the loop in L3-L5 terminates (once it is started), we use a 
helper lemma, stating that the j-th iteration is eventually completed, for any 
process id z. The helper lemma is proved by straightforward induction: eventual 
completion of the j-th iteration follows from completion of the (j— l)-th iteration, 
together with the fairness assumption. Completion of the entire loop then follows 
from fairness and completion of the last iteration. Here is the SMV code: 

forall (z in PID) forall (j in PID) { 

reaches-L5[i][j\: assert G{pc[i] = L3 ^ F(pc[z] = L5 A count[i] = j)); 
using reaches-L5[i][j-l], fair[i] prove reaches -L5[i][j\; 
using reaches-L5[i][N], fair[i] prove reaches-L6[i]; } 

The termination proof for the loop in lines L8-L11 (that is, for the lemma 
reaches-L12) is more difficult in that it involves induction not only over iterations 
of the loop, but also over tickets. To prove that a process z completes the loop, 
we use the induction hypothesis that any process j with a lower ticket eventually 
reaches (and by fairness, eventually leaves) the critical section, and so cannot 
cause process z to wait forever at LIO. We use the following three lemmas: 

forall (zz in NAT) forall (z in PID) forall (j in PID) { 
reaches-Ll 1 [n] [z] [j] : 

assert G{pc[i] = L8 A number[i] = zz F{pc[i] = Lll A cozzzzt[z] = j)); 
lower-numher-reaches-L 1 2[n] [j] : 

assert G{number\j\ < n A pc[j] = L8 ^ F{pc[j] = L12))] 
lower-pid-reaches-L 1 2[n] [z] [j]: 

assert G{number[j] = n A j < i A pc[j] = L8 ^ F{pc[j] = L12)); } 

The first states that a process with ticket (zz,z) eventually completes the j-th 
iteration of the loop. The other two together state the induction hypothesis that 
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any process j with a ticket lexicographically less than (n, i) eventually completes 
the entire loop. We prove these three lemmas by mutual induction, over three 
parameters n, i and j, where (n,t) is the ticket and j is the value of the loop 
counter. For example, induction over j can be seen in the SMV command to 
prove reaches-Lll: 

using fair[i], fair[j], reaches-Lll[n][i][j-l], 

lower-pid-reaches-L 1 2\n]\'i\ [i1, lower-numher-reaches-L 1 2\n] [i1, 
reaches.L6[j\, NAT=^{0,n}, ... 

prove reaches -Lll[n][i][j\; 

To prove that process i eventually finishes waiting for process j, we assume that 
if process j has a lower ticket it will eventually reach LI 2. The next time around 
the loop, process j must choose a larger ticket. Note we also need reaches-L6, 
to ensure that process j cannot make process i wait forever at L9. By writing 
NAT — >■ {0, n}, we specify that the abstraction for type NAT keep 0 as a distin- 
guished value, lest we get an abstraction too coarse to model the interaction of 
the assignment at L 13 with the test at line LIO. 

We omit here the similar, but simpler, proofs of the other two lemmas, 
which contain the inductive steps over n and i respectively. We remark sim- 
ply that the three lemmas are mutually dependent, and that well-foundedness 
of the entire mutual induction is automatically checked by SMV, as described 
in section 3. We also omit the command to prove that reaches-L12 follows from 
lower-number-reaches-L 12. 

Note that we could have combined lemma lower-numher-reaches-L12 and 
lemma lowev-pid-reaches-L 1 2 (or even these two and reaches-Lll) into a single 
lemma. However, the proof would then have involved temporal case-splitting 
over four (or more) variables at once, rather than three. Increasing the number 
of simultaneous case splits increases the number of concrete values in the ab- 
stract types, thus making the proof inefficient in two ways. First, the resulting 
finer abstractions tend to be individually more costly to model-check. Second, 
more distinct abstractions must be checked in order to check a representative of 
every equivalence class. In general, it is desirable to structure proofs so that the 
abstractions used are as coarse as possible (but no coarser) . 

6 Conclusion 

By using an appropriate parameterized abstraction of the natural numbers, we 
can reduce a proof by induction over the natural numbers to a finite number 
of finite state verification problems. This is because, on the one hand, each 
abstracted type is finite, and on the other hand, the infinite set of abstractions 
falls into a finite number of isomorphism classes. As a result, we can check proofs 
by model checking that would ordinarily be considered to be only in the domain 
of theorem provers. The advantage is that, in a proof by model checking, we 
need not consider the details of the control of a system, since these are handled 
by state enumeration. This technique has been implemented in the SMV proof 
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assistant, and integrated with a variety of techniques for reducing infinite state 
problems to finite state problems. 

This technique is more general than previous techniques that use model 
checking as part of an induction proof [KM89,WL89] in that variables may range 
over unbounded types, and the proof is fully mechanically checked. Further, a 
novel induction scheme based on proof graph abstractions makes it possible to 
do fairly complex proofs using mutual induction over multiple variables, with- 
out explicit recourse to an induction rule, as the induction scheme is inferred by 
analyzing the abstract proof graphs. 

The method is not fully automated, in that it relies on the user to supply 
inductive properties, which may have to be stronger than the property being 
proved. Note, however, that it is not required to provide an inductive temporal 
invariant, since the model checker can, in effect, compute the strongest inva- 
riants of the abstract models. Since linear temporal logic with natural number 
variables is undecidable (by a trivial reduction from termination of a two-counter 
machine), the method described here is necessarily incomplete. 

A practical problem with the method is that the number of cases (z. e., feasible 
inequality relations) expands very rapidly with the number of parameters and 
fixed constants of a given type. Thus, in the proof of the bakery algorithm, for 
example, considerable care had to be used to minimize the number of parameters 
in the lemmas. Clearly, techniques of reducing this case explosion (perhaps using 
weaker abstractions) would make the technique easier to apply. 

Finally, we note that the proof of the bakery algorithm, which relies substan- 
tially on properties of the natural numbers, was more difficult than proofs of 
considerably more complex systems {e.g. [McM99]) that are control dominated. 
Nevertheless, we think this example shows that model checking can be used to 
advantage even in an area where finite control plays a relatively small part. This 
suggests that there may be many areas in which model checking can be applied 
that previously have not been considered amenable to finite state methods. 
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Abstract. The paper considers the problem of uniform verification of 
parameterized systems by symbolic model checking, using formulas in 
FSlS (a syntactic variant of the 2nd order logic wsls) for the symbolic 
representation of sets of states. The technical difficulty addressed in this 
work is that, in many cases, standard model-checking computations fail 
to converge. 

Using the tool tlv[P], we formulated a general approach to the accelera- 
tion of the transition relations, allowing an unbounded number of diffe- 
rent processes to change their local state (or interact with their neighbor) 
in a single step. We demonstrate that this acceleration process solves the 
difficulty and enables an efficient symbolic model-checking of many para- 
meterized systems such as mutual-exclusion and token-passing protocols 
for any value of N , the parameter specifying the size of the system. 
Most previous approaches to the uniform verification of parameterized 
systems, only considered safety properties of such systems. In this pa- 
per, we present an approach to the verification of iveness properties and 
demonstrate its application to prove accessibility properties of the con- 
sidered protocols. 

Keywords: Symbolic model checking; Parametric systems; Accelera- 
tion; Liveness; Regular expressions; wslS 



1 Introduction 

The problem of uniform verification of parameterized systems is one of the 
most thoroughly researched problems in computer-aided verification. The pro- 
blem seems particularly elusive in the case of systems that consist of regularly 
connected finite-state processes (a process network). Such a system can be mo- 
del checked for any given configuration, but this does not provide a conclusive 
evidence for the question of uniform verification, i.e., showing that the system is 
correct for all possible configurations. In [KMM+97], we proposed an approach 
to the uniform verification of parameterized systems based on symbolic model 
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checking in which the assertional language used to represent sets of reachable 
global states is that of a regular expressions over a finite alphabet which repre- 
sents the local state of each of the processes in the system. As a trivial illustrative 
example, consider a parameterized system S{N) consisting of N processes ar- 
ranged in a linear array. Assume that the local state of each process can be 
represented by the two values 0 and 1, where the state of a process P[i] is 1 iff 
P[i] currently has the token which is passed around. 

The initial global state (to which we refer as a configuration) can be described 
by the regular expression / = 10* representing the global state in which the 
leftmost process has the token. Note that even though every instance of the 
system S{N) has a unique initial configuration, the expression 10* represents 
the infinite set of initial configurations obtained by considering the infinitely 
many different values of N. 

The transition relation of this parameterized system can be represented by 
the binary rewrite rule given by 10 — >■ 01. This rewrite rules states that a 

single step of the system applied to a configuration represented by a word w may 
locate the substring 10 within w and replace it by the substring 01. Obviously, 
such a step represents the transmission of a token from a process with a token 
to its right neighbor, provided the neighbor is not currently in possession of a 
token. 

To represent such rewrite rules in the most general context, [KMM+97] sug- 
gested to use a finite-state transducer which is an automaton reading a string of 
pairs of letters, one representing the pre-transition configuration and the other 
representing the post-transition configuration. Using the standard notation of 
unprimed and primed values to respectively represent these two configurations, 
the transducer corresponding to the above transmission transition can again be 
represented by the following regular expression: 

T = (00' -k 11')* (10') (01') (00' + 11')* 

Given a finite-state transducer T representing the transition relation and a regu- 
lar expression E representing a set of configurations, it is not difficult to compute 
the set of T-postimages or T-preimages of the configurations in E which is gu- 
aranteed to be another regular expression. For example the T-postimage of 10* 
is the regular expression 010*. We denote by UoT and To E the T-postimages 
(T-successor) and T-preimages (T-predecessor) of E, respectively. 

To perform symbolic model checking we usually need the iterated versions of 
these two operators computed as follows: 

EoT* = E + EoT + (ToT)oT -k ((ToT)oT)oT -k ••• 

T*oE = E + ToE + To(ToT) -k To(To(ToT)) -k ••• 

Now, if (/? is a regular expression representing a property we wish to prove an 
invariant of the system, then S{N) ^ (p for every N iff 

(/oT*)n^ = 0 or (T*o^)n/ = 0, 
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where 'ip denotes the complement of p. The first clause corresponds to forward 
exploration starting from the initial condition I while the second clause cor- 
responds to backwards exploration starting from the set of states violating the 
property p. 

The difficulty specific to regular model checking of parameterized systems is 
that, unlike BDD-based model checking of finite-state systems, the computation 
of either loT* or T*op may fail to terminate. In fact, theoretical considerations 
predict that there will be cases in which these computations cannot terminate. 
Termination of the computation of /oT* implies that the set of strings encoding 
reachable configurations is a regular language, and it is easy to construct systems 
in which the set of reachable configurations forms a context-free language. 

However, experience with these methods shows that there are many cases 
in which the set of reachable configurations is regular yet the straightforward 
computation of /oT* fails to converge. Assume that we wish to establish for the 
above example system the invariance of the property p = 0*10*, claiming that 
all reachable configurations contains precisely one token. To apply backwards 
exploration, we first compute the set of violating configurations, given by ^ = 
0* -I- (0 -I- 1)*1(0 -I- 1)*1(0 -I- 1)*. The computation of T* o ^ terminates in one 
step, yielding T* op = ^ = 0* -I- (0 -I- 1)*1(0-|- 1)*1(0-|- 1)* which, obviously, has 
an empty intersection with I = 10*, establishing that p is an invariant of the 
considered system. 

On the other hand, the computation of the forward exploration according to 
I oT* fails to terminate, yielding the following infinite sequence of approxima- 
tions: 

10* -k 010* -k 0010* -k 00010* -k • • • 

The source of the problem was identified in [ABJN99] as stemming from the 
fact that the transition relation T represents a step in which only one process (or 
a pair of contiguous processes) makes a move. The remedy proposed by this paper 
is to use the notion of an accelerated transition in which several (unbounded 
many) processes can make a move at the same step. For example, the accelerated 
version of the transition relation T = (00' + 11')* (10') (01') (00' + 11')* can be 
computed to be 

Ta = (00' -k 11')* (10') (00')* (OT) (00' -k IT)* 

Applying the accelerated transition in a forward exploration terminates now in 
a single step and yields I oT* =0*10*. 

The work in [ABJN99] proposes a “speed-up” (acceleration) operation which 
transforms a single-process transition relation presented by a transducer T into 
an accelerated transducer which represents the effect of many processors 
taking an action in the same step, under certain conditions restricting the de- 
pendency of a single-process action on the local states of the other processes. 
The analysis there is based on a language-theoretic representation of the asser- 
tions and the representation of transition relations by finite-state transducers. 
Using such acceleration techniques, [ABJN99] managed to verify fully automati- 
cally various parameterized protocols such as the Bakery and Ticket algorithms 
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by Lamport, Burn’s protocol, Dijkstra’s and Szymanski’s algorithms for mutual 
exclusion. 

The methods of [ABJN99] could only accelerate elementary transitions which 
only modified the local state of one process at a time. This made them inap- 
plicable to the representation of systems which included synchronous message 
passing, such as the binary transformation 10 — f 01 appearing in our example 
system. This drawback has been recently corrected in [JNOO] which presents a 
speed-up operation which can be applied to elementary transitions that modify 
several contiguous processes at the same time. 

The work presented here improves upon the results of [ABJN99] and [JNOO] 
in several directions. To start with, our presentation framework uses the logic 
FSlS (a syntactic variant of wsls, the weak second-order monadic theory of one 
successor [Tho90]) to present sets of configurations, e.g. the initial condition 
and the properties, as well as the transition relation. This uniform presentation 
by a powerful logic enables us to formulate several acceleration schemes still 
within the same language. Furthermore, the soundness of the transducer-based 
acceleration schemes of [ABJN99] and [JNOO] depends on particular assumptions 
that the transition relation has to satisfy, such as a particular forms of left- 
and right-contexts. These have to be checked whenever one wants to apply the 
acceleration schemes of [ABJN99] and [JNOO] to a particular transition relation. 
In our case, the acceleration is always sound and could never lead to false positive. 
In the worse case, they will not produce a useful acceleration and the process 
will continue to diverge even after the acceleration. 

Using our acceleration schemes which are applicable to unary and binary ele- 
mentary transitions in an unrestricted way, we managed to verify the protocols 
considered in [ABJN99] in a very efficient manner, and consider some additional 
protocols which use synchronous communication, such as a token-passing proto- 
col for mutual exclusion and the distributed termination detection algorithm of 
[DFvG83]. 

However, the most important contribution of this paper is the extension of 
the regular model checking method to include verification of liveness properties, 
while all previous efforts concentrated on the parameterized verification of sa- 
fety properties. Using these extensions, we managed to verify the property of 
accessibility for some of the protocols considered above. 

Related Work 

There are several results on algorithmic verification of parameterized systems 
[SG92,AJ98,CGJ95]. In most of these works the transitions are guarded by lo- 
cal conditions involving the local states of a fixed (unparameterized) number of 
processes, in contrast with the general global dependency which is allowed in 
[KMM+97,ABJN99,JN00]. The notions of speed-ups and acceleration of transi- 
tions were considered in [BG96,BGWW97,BH97,ABJ98]. However, the accelera- 
tions considered there only condensed several moves of a fixed number of proces- 
ses, while in our case (and in [ABJN99,JN00]) we consider speed-ups obtained 
by performing actions of an unbounded number of different processes, sequenti- 
ally or in parallel. Previous attempts to verify parameterized protocols such as 
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Burn’s protocol [JL98] and Szymanski’s algorithm [GZ98,MAB+94,MP90] relied 
on abstraction functions or lemmas provided by the user. Other approaches to 
uniform parameterized verification are based on induction, where the user sup- 
plies the induction hypothesis either in the form of an assertion or in the form 
of a network invariant [CGJ95,KM89,WL89]. 

A recent work which has a significant overlap with our work has been presen- 
ted by Bodeveix and Filali in [BFOO]. Similarly to our approach, they advanta- 
geously employ the expressive power of wsls to present explicit formulas which 
capture various acceleration schemes. They report about a tool FMona which is 
a high-level macro-processor for MONA [HJJ+96]. The main differences between 
their work and ours are that, at this point, they do not consider liveness. Also, 
on the technical level, unlike the tlv[P] tool which we use for the verification re- 
ported in this paper, the FMona tool does not seem to support a programming 
layer in which algorithms such as model-checking for safety and liveness can be 
programmed. As a result, if one wants to iterate the application of a transition 
relation to a set of states until it converges, it is necessary to provide an a priori 
bound n on how many iterations are necessary and to invoke the FMona macro 
processor which will expand the appropriate iteration into a pure MONA code of 
size linear in n. 



2 The Logic FSls 

We use the logic FSlS, {finitary second-order theory of one successor) as a spe- 
cification language for sets of global states of parameterized systems. This logic 
is derived from the weak second order logic of one successor [Tho90] and also re- 
sembles the language m2l used in MONA [HJJ+96]. The main difference between 
wsls and FSlS is that, in FSlS, we assume the existence of a special variable 
M which provides an upper bound to the size of all arrays. We found the use 
of this common upper bound to be of much help in the description of circular 
architectures such as rings. This is only a matter of convenience, because, it is 
always possible to introduce M as a second-order variable of wslS and postulate 
its upper-bound properties. It is well known that FSlS (as well as wsls) has the 
expressive power of regular expressions, as well as finite automata which are the 
representation underlying our implementation. Following is a brief definition of 
the logic. 

Syntax 

We assume a signature Ei : {Si, . . . , Sk} consisting of a finite set of finite al- 
phabets. The vocabulary consists of position variables pi,p 2 ,-- - and, for each 
Si € E, a, set of Si-array variables Xi, Yi, Zi, .... The special position variable 
M denotes the upper bound on the length of all arrays and all position variables. 

• Position (First-order) terms: 

The constant 1 and any position variable pi are position terms. If t is a 
position term then so is I -I- I. 

• Letter terms: 
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■ Every a € 17^ is a Si-term. 

■ If X is a A'i-array variable and t is a position term, then X[t] is a A'^-term. 

• Atomic Formulas: 

■ ~ t 2 , where ti and t 2 are position terms and ~ G {=, <}. 

• X = y, where x and y are A'^-terms for some S^ £ S . 

• Formulas: 

■ An atomic formula is a formula. 

■ Let if) and ip be formulas. Then -up, p V ip, 3p : p, 3X : p are formulas, 
where p is a position variable and X is an array variable. 

For example, assume that II is an array over the alphabet S\ = {N,T,C} 
intended to represent the control location of a process in a process-array P[l], . . . , 
P[M\. Similarly, assume that tok is a Boolean array (special case of S 2 = {0, 1}) 
intended to represent the fact that process P[i] currently has the token. Then, 
the wsls-formula 

0 : Vi : {n[i] = N) A tok[l] A Vj yf 1 : ~<tok[j] 

characterizes the set of initial configurations in which all processes are in their 
initial control location N and only the leftmost process (process T*[l]) has the 
token. 

We refer the reader to [KMM“*'97] for the definition of the semantics of FSlS. 

3 The Logic FSls is Adequate 

In this section we demonstrate the use of FSlS for expressing the constituents 
of a parameterized system. As a running example, we will use program mux of 
Fig. 1 which implements mutual exclusion by synchronous communication. 

The body of the program is a variable-size parallel composition of processes 
P[l], . . . , P[M], Each process P[i] has two local state variables: a local boolean 
variable tok whose initial value is 1 (true) for z = 1 and 0 (false) for all other 
processes, and a control variable II ranging over the set of locations {N, T, C} 
(the noncritical section, the trying section, and the critical section, respectively). 
Process P[i] sends the boolean value 1 on channel Q![z©m 1] to its right neighbor 
(z©M 1 is addition modulo M) and reads into variable tok a (true) boolean value 
from its left neighbor on channel a[i\. As seen in the program, process P[i\ can 
enter its critical section only if P\i].tok = 1. 

As our computational model we use the model of fair discrete systems con- 
sisting of a set X of state variables, an initial condition 0, a transition relation 
p, a set H of justice (weak fairness) requirements, and a set C of compassion 
(strong fairness) requirements. We proceed to show how these constituents can 
be specified in FSlS for system MUX . 

The State Variables: We define the type 

state = record of (77 : {N, T, C}, tok : boolean) 
and the array variable 

X : array 1..M of state. 

Note that this is equivalent to the definition of two arrays, the array 77[1..M] 
and the Boolean array tok[l..M]. Therefore, we will often abbreviate X[i].II and 
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Fig. 1. Parameterized Program mux. 



X[j].tok to n[i] and tok[j] respectively. On the other hand, we write X[i] = X[j] 
as an abbreviation for {X[i].II = X[j].II) A {X[i].tok = X[j].tok) and 3X : (p 
as an abbreviation for 3X.II : 3X.tok : Lp. 

The Initial Condition: The initial condition can be given by the FSlS formula 

0 : (Vi : n[i] = N) A tok[l] A Vi yf 1 : ~>tok[i] 

The Transition Relation: The transition relation can be formed as the disjunc- 
tion of three types of elementary transitions. Using the abbreviation presX(j) = 
X'[j] = X[j], these can be expressed as follows: 

idle :\/j : presX(j) 

pi{X, X' ,i) : idle V (Vj yf i : presX{j)) A 

(7T[i] = iV) A (il'[i] = T) A {tok'\i] = tok\i]) 

V {n[i] = C) A {n'[i] = N) A \tok'[i] = tok[i]) 

V {n[i] = T) A {n'[i] = C) A {tok'[i] = tok[i] = 1) ^ 

P 2 {X,X',i) : idle V (Vj ^ {i,i©M 1} : presX(j)) 

A {n[i] = N) A tok[i] A (il[i ©m 1] G {iV, T}) A -'tofc[i©Ml] 

A (iT'[i] = iV) A -itok'li] A {IJ'[i (Bm ^ = T[[i (Bm i-]) A tofc'[i©Ml] 

Subtransition idle represents the case that the system does not change it’s state. 
Subtransition pi{X, X' ,i) is a unary transition in which a single process P[i] 
takes a local action that can only modify the local state of P[i]. All other pro- 
cesses retain their local state. Finally, subtransition px{X,X' ,i) corresponds to 
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a binary transition in which process P[i] sends the token to process P[i ©m !]• 
Only the two involved processes change their local states. 

We can now define the global transition relation by taking 

p{X,X') = idle V (3i : pi{X,X',i) V p 2 (X,X',i)). 

However, as explained in the introduction, this single-action transition can be 
used in few cases for backwards exploration model checking but will often fail 
to converge when used in a forward exploration model checking. 

We defer the specification of the justice and compassion requirements of sy- 
stem MUX to Section 5 in which we discuss the verification of liveness properties, 
where the fairness requirements become relevant. 

3.1 Model Checking 

Having obtained the FSlS representation of the transition relation p{X,X') of 
a system such as mux, there are several symbolic model checking tasks we can 
perform. For an FSlS formula <p{X) representing a set of configurations, we can 
compute the /o-successor and p-predecessor of p by the following expressions: 

p o p = unprime{3X ■. p{X) A p{X,X')) 
pop = 3V \ p{X,V) A p{V), 

where unprime is a substitution operation which transforms each occurrence of 
into A'[fc], and V is an auxiliary array variable of type state. 

Note that pop computes the set of states satisfying EXip from which, by 
iteration and boolean operations, we can compute EF p and AGp, provided the 
iteration converges. 



4 Acceleration 



Acceleration condenses a potentially unbounded number of applications of tran- 
sitions into a single transition, by defining a single “accelerated transition re- 
lation”. It is up to the user to observe that acceleration is required and select 
the appropriate accelerations schemes to be applied. Since all accelerations are 
sound, there is no danger (except loss of time) in applying all the acceleration 
schemes which are available at a particular implementation. Since the verifica- 
tion problem for parameterized system is, in general, undecidable [AK86] , there 
is no chance of accumulating a “complete” set of acceleration schemes. The best 
we can hope for is the assembly of a large set of schemes which can cover many 
of the useful examples. 

To handle most of the cases in which regular model checking with single- 
action transition relation failed to terminate, we found it necessary to consider 
three types of acceleration which we will now present. 
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4.1 Local Acceleration 

In this mode of acceleration, we allow several actions to be taken in succession 
by the same process P[i]. Given a unary transition relation pi(A, we can 

compute its locally accelerated version by the repeated composition 

p“ = Pi V Pi o Pi V (piopi)opi V ((piOpi)opi)opi V •••, 

where the composition pa o Pb is defined by 

Pa;b(A,A',i) = {3V ■. pa{X,V,i) ^ pb{V,X',i)). 

For example, applying local acceleration to the unary transition relation 
pi(A, A',i) of program mux, we obtain (after some manual simplification) the 
following accelerated unary transition: 

n[i] G {A, T, C} A tok[i]' = tok[i] 

A 'ij = n[f\ A tok'[f\ = tok[j]) 

p?(A,A',i) = n[i]' = n\{\ 

A \j tok[i] = Q ^ n'[i]& {N,T} 

^ [w tok[i] = l A n'[i]G{N,T,C})} , 



4.2 Global Acceleration of Unary Transitions 

Next, we consider the acceleration of a unary transition on which each of a set 
of processes takes a single action. Assume as before that the unary transition 
relation of process P[i] is given by pi{X, X' , i), and that idle -A pi. The following 
formula expressing this acceleration uses the auxiliary stoie-array variables T 
and V. 



p?(A,A') =Vz 



/ A'W=A[z] 

y . T[j] = if j < z then X'[j] else X[j] 
V 3T,V ^ y[j] = if j < z then X'[j] else X[j] 

V LApi(T,U,i) 



This accelerated transition applies p\{X, A', z) to processes P[l], . . . , P[M] in se- 
quential order. Every activated process P[i] may non-deterministically choose to 
idle (which is one of the options allowed by pi ) or change its local state according 
to pi. For process P[i] we require that, after all processes P[l], P[2], . . . , P[i — 1] 
have taken their actions, we reach a configuration from which P[i] can take its 
action. This is done by forming the two arrays T and V . where V represents the 
configuration prior to P[z]’s action and V represents the configuration resulting 
from P[z]’s action. 

For example, applying global acceleration to the accelerated unary transition 
relation p“(A, A',z) of program mux, we obtain (after some manual simplifica- 
tion to improve readability) the following accelerated unary transition: 

n[i] G {A, T, C} A tok[i]' = tok[i] 

7T[z]' = iT[z] 

A V tofc[z] = 0 A 7T'[i] G {A,T} 

, V tok[i\ = 1 A n'[i\ G {N,T,C}) 



pf(A,A') = Vz 
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Note that the acceleration scheme presented here proceeds from left to right. 
It is straightforward to define an acceleration scheme which proceeds from right 
to left. 

4.3 Global Acceleration of Binary Transitions 

Finally, let us consider the acceleration of a binary transition, such as P 2 (A, A', i) 
previously presented for program mux. 

Unlike the acceleration of unary transitions, where the local state of each 
process changed at most once, in the case of binary acceleration some processes 
may change their local state twice. For example, they may change their state once 
when they receive the token from their left neighbor and then once more when 
they send the token to their right neighbor. Thus the acceleration of a binary 
token-passing transition may in one step move the token from process P[i] to 
process P[j] for an arbitrary j > i. To accommodate the phenomenon that some 
processes may change their values twice, we employ an additional state-array W 
to save the sequence of intermediate local states for these processes. 

Let p 2 {X, X' ,i) be a binary transition which may affect at most the compo- 
nents X[i] and X[i (Bm !]• Without loss of generality, assume that idle — f p 2 - 
The formula p^iX, X'), expressing the global acceleration of P 2 {X, X' , i) is given 

by 
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As we can see, in the binary acceleration case, we sequentially apply the binary 
transition p 2 to processes P[l], . . . P[M — 1], where any of them may nondeter- 
ministically choose to take the idling transition. In the general case, each of the 
processes P[2], . . . , P[M — 1] may change their local states at most twice, while 
processes P[l] and P[M] may change their local state at most once. We use 
the auxiliary array W to store the intermediate value of the local state of all 
processes. 

Note that this acceleration scheme does not apply p 2 to process P[M\. When 
we compute the total transition relation we add p 2 {X, X' , M) as an additional 
explicit disjunct. 
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These acceleration schemes were successfully applied to program mux and 
transformed the regular model checking procedure based on forward exploration 
from a divergent process into an efficiently convergent one, requiring no more 
than 4 iterations to converge in a matter of few seconds. More details about 
these computations are presented in Section 6. 

5 Liveness 

All of the previous results for the uniform algorithmic verification of parame- 
terized systems concentrated on proofs of safety properties. Here we present 
an approach to the verification of liveness properties, using regular symbolic 
model checking. The main problem with parameterized verification of liveness 
properties is not so much that the property to be proven is more complex, but 
that we have to take into account an unbounded number of fairness assumptions, 
several for each process, and that these requirements are also parameterized. To 
appreciate the problem, let us specify the fairness requirements associated with 
program mux which, for the sake of simplicity of presentation, we restricted to 
justice (weak fairness) only. 

5.1 Justice Requirements for Program MUX 

There are three justice requirements associated with each process of program 
MUX. Respectively, they require that the process will never get stuck at location 
C, that it will never get stuck at location T while the process has the token, and 
that the process will not retain the token forever while it’s right neighbor is conti- 
nuously ready to receive it. In the computational model of fair discrete systems, 
justice requirements are presented as a set of assertions J = { Ji, . . . , J^}, with 
the requirement that a computation should infinitely often visit states satisfying 
Jj for each j = 1, . . . ,k. In the parameterized case, each justice requirement is 
also parameterized by a process index i, and the requirement should be extended 
to cover all i G [1..M]. 

For program mux, the justice requirements are given by 

Mi ] : -n{n[i] = c) 

J 2 [i] ■ ~'{{II[i] =T) A tok[i]) 

Mi] : Mokli] A {U[i ®m G {N,T})) 

In theory, one may try to verify a liveness property “every p is eventually followed 
by q” of a parameterized system using the standard symbolic model-checking 
algorithm. The core of this algorithm is the computation of the set of states 
lying on a fair -ig-path. This computation can be succinctly described by the 
following fix-point formula: 

EfG^q = vY{-^q A poY A Vt : (/\((p A -9)* o (F A J, [z])))) 

3 

Unfortunately, this computation seldom converges, even if we use an accelerated 
version of the transition relation. This is certainly the case for program mux. 
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5.2 Detecting Bad Cycles 

Since the systems we analyze are finite-state (for every value of their parameter), 
it is obvious that the formula E/G-ig characterizes the states from which there 
exists a (-'(7)-path leading to a fair (-■(7)-cycle, where the cycle being fair means 
that it visits at least once a J^-state, for each j = 1 , . . . ,k. Denoting by G the set 
of states that participate in a fair (-■(7)-cycle, an equivalent requirement is that 
each s G G has a successor in G and that, for each s G G and each j = 1 , . . . ,k, 
there exists a cycle from s to itself which visits on the way some Jj-state. 

Assume that p{X, X') represents the total transition relation of the parame- 
terized system, after all accelerations. The following algorithm computes the set 
of states participating in a fair (-ig)-cycle: 

1 . (fii := (O o p*) A -<q 

2 . Pi := p A (fi A (Vf : U'[i] = U[i]) 

3 . ■= Pi A (Vz : U[i] = X[i]) 

4 . (p3 := t/?2 A Pi o (pi o P2) 

5. for j := 1, . . . , fc do 

6 - P3-=P3 A (Vz : p* o (J,[z] A <pi A (p^opa))) 

7. p4 := (3U : pa) 



Line 1 places in pi the set of (-'g)-states which are reachable. Line 2 places in 
Pi a version of p restricted to move only within pi-states and to preserve a set 
of variables called U, which is a newly introduced copy of the state variables. 
Line 3 adds to the sets of states the interpretation of the auxiliary variables 
U and places in p2 the subset of pi-states in which the interpretations of X 
and U agree. Line 4 places in pa the set of p2-states from which there exists a 
non-empty pi-path leading to another p2-state. 

To see this, consider a state si belonging to ps, and let S2 be the p2-state 
reached at the end of the non-empty pi-path. Since S2 is a p2-state, we know that 
S2[U] = S2[^], i.e. the interpretation of U in S2 is identical to the interpretation 
of X in S2. Since any pi-path preserves the interpretation of U, we also have 
that Si and S2 agree on the interpretation of U, i.e., si[G] = S2\U]- Since si 
is also a p2-state, it follows that si[X] = si[[/]. Consequently, we have that 
si)^] = si[C/] = S2[^] = S2\U] which implies that si and S2 are identical states 
and, therefore, si participates in a non-empty pi-cycle. 

Following a similar argument, the iterations at line 6 retain at pa only the 
P2-states which reside on a cycle containing a Jj[i] state for each j and each i. 
The cycles may be different for different values of j and z, but they can always 
be combined into a very big cycle which contains them all, and may revisit the 
originating state many times. 

It follows that, when (and if) the algorithm terminates, p4 contains the states 
which reside on a non-empty fair (-i(j)-cycle. 

The algorithm presented above can, in principle, be used also for conventio- 
nal (non-parametric) symbolic model checking of liveness properties. However 
it is not advisable to do so, because the algorithm is highly inefficient in the 
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conventional context due to the introduction of the auxiliary copy U of the state 
variables. 

Normally, assertions of states and transitions relations are specified as ha- 
ving the types — >■ {0, 1} and p : F x — >• {0, 1}. When adding an 

additional copy of the state variables we obtain assertions: p : F x [/ — >• {0, 1} 
and p : V X U X V' X U' ^ {0, 1}. 

Note that all the work on acceleration actually computes pop separately from 
its application to any assertion p. This kind of computation is usually avoided 
whenever possible. For example, in symbolic backwards exploration, it is more 
efficient to compute po {po p) rather than {po p)o p. 

For these reasons the additional copy of the state variables excises a heavy 
penalty, as is evident from the performance figures presented in Table 1 of Sec- 
tion 6. However, in the parameterized context, this is the only fully automatic 
algorithm we managed to successfully use for the verification of liveness pro- 
perties. 

5.3 Liveness Using Pseudo Cycles 

Realizing the heavy price one has to pay for a full second copy of the state va- 
riables, we developed another approach which replaces the notion of a cycle by 
a pseudo cycle. Assume that the set of reachable states is partitioned by a parti- 
tion n into a set of disjoint classes. A pseudo-cycle, relative to the partition 77, 
is a path which begins and ends in two states belonging to the same class. Note, 
that when the partition 77 is the finest possible, that is, each class containing a 
single state, then the notions of a pseudo-cycle and a cycle coincide. 

To use this approach, the user has to provide a parameterized assertion E{i), 
which defines the partition, consisting of a class for each value of i. The pseudo- 
cycle method is guaranteed to be sound but may produce false negatives due to 
its approximative nature. 

The following is the improved algorithm for finding fair (-igj-pseudo-cycles: 

1. Pi := (O o p*) A -<q 

2. Pi ■■= p A Pi 

3. p2 := Pi A E{i) 

4. p3 := P 2 /\ p*iO (pi o P 2 ) 

5. for j := 1, . . . , fc do 

6. P3 ■= P3 A (Vi : p* o A Pi A (piOp 2 ))) 

Let E{i) be an assertion such that pi — >■ E{i). E{i) should be such that it 
partitions the space of (-■gj-reachable states. This partition corresponds to the 
set of state classes we use in order to find pseudo cycles. 

The improved algorithm is similar to the original one, except for lines 2,3, 
in which we omitted the references to U, and line 7 which is omitted entirely. 
Instead, line 3 includes a conjunct of E{i). The original constraint on line 3, 
(Vz : U[i] = X[i]), uses U to form the finest partition, where each partition class 
contains only a single state. 
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It is clear that if there exists a real fair -ig-cycle, then the improved algorithm 
will find it. Therefore, if the algorithm declares that there are no bad pseudo- 
cycles, this implies that, in particular there are no bad cycles, which establishes 
the soundness of the algorithm when it is used to deduce the absence of any bad 
cycles. 

6 Results 

In table 1, we present the results of our regular uniform verification applied to 
several well-known algorithms. The results do not include the computations of 
the accelerated transitions. It is obvious that the verification of safety properties 
is significantly more efficient than the verification of liveness properties. 



Algorithm 


Safety 


Liveness 


Improved Liveness 


Time 


Iterations 


Time 


Iterations 


Time 


Iterations 


Token ring 


0.4 


3 


53 


40 


9.2 


32 


Szymanski 


0.2 


8 


- 


- 


- 


- 


Termination detection 


5.6 


9 


- 


- 


- 


- 


Dining philosophers 


0.6 


3 


- 


- 


- 


- 



Table 1. Experimental results (times in seconds) 



7 Conclusions 

In this paper we presented several significant extensions to the state-of-the art in 
uniform verification of parameterized systems. We demonstrated the expressive 
power of the logic FSlS as an efficient vehicle for expressing both the system 
constituents as well as the meta-operations of acceleration. We presented several 
acceleration schemes that lead to a very efficient regular model checking of sa- 
fety parameterized properties. Finally, we presented the first approach to the 
uniform verification of liveness properties of parameterized systems using the 
FSlS framework and the tlv[P] tool. 
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Abstract. The Available Bit Rate protocol (ABR) for ATM networks 
is well-adapted to data traffic by providing minimum rate guarantees 
and low cell loss to the ABR source end system. An ABR conformance 
algorithm for controlling the source rates through an interface has been 
defined by ATM Forum and a more efficient version of it has been de- 
signed in [13]. We present in this work the first complete mechanical 
verification of the equivalence between these two algorithms. The proof 
is involved and has been supported by the PVS theorem-prover. It has 
required many lemmas, case analysis and induction reasoning for the 
manipulation of unbounded scheduling lists. Some ABR conformance 
protocols have been verified in previous works. However these protocols 
are approximations of the one we consider here. For instance, the algo- 
rithms mechanically proved in [10] and [5] consider scheduling lists with 
only two elements. 



Introduction 

The Available Bit Rate protocol (ABR) for ATM networks is well-adapted to 
data traffic by providing minimum rate guarantees and low cell loss to the ABR 
source end system. The protocol relies on a contract between the operator who 
ensures a minimum rate and the source who must respect a rate that is dyna- 
mically allocated to him, according to the resources available in the networks. 
Due to its flexibility the ABR service admits elaborated traffic management me- 
chanisms. To avoid congestion the operator should control in real-time that the 
actual rate consumed by every ABR application is consistent with the allowed 
rate. Several algorithms for this conformance control have been proposed and 
discussed in standardization committees. 

* Supported by CNET CTf 96 IB 008 and Action de Recherche Cooperative INRfA 
PRESYSA 



E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 344—357, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Ideal Incremental ABR Conformance Algorithm 345 



It is essential for an operator to give evidence that the conformance control of 
the service he proposes does not jeopardize the quality of service (QoS) provided 
by the ATM network. For such a task formal validation through mathematical 
arguments is required. However conformance control verification often involves 
complex case analysis or inductions. This has motivated some operators to em- 
ploy automated verification tools such as proof-assistants or model-checkers to 
process these proof obligations. 

The algorithm Acr that has been defined by ATM Forum is considered to give 
the optimal conformance control, in that it computes the minimal allowed rate 
among the other algorithms. An algorithm (B’) for computing an approximation 
of the optimal control has been designed by C. Rabadan from France-Telecom 
[12]. It is more efficient since the next two rates to be controlled are scheduled 
and are updated when receiving RM-cells. This incremental algorithm has been 
generalized in [13] to the scheduling of an arbitrary number of rates in the future. 
Our goal here is to derive a mechanical proof that this ideal incremental algo- 
rithm is indeed equivalent to the reference algorithm Acr. By this we mean that 
every step in the equivalence proof has been verified mechanically. The theorem 
prover we use is PVS [11]. Although PVS is interactive and operates under the 
direct control of the user it is capable of large autonomous deduction steps by 
appealing to decision procedures for arithmetics, to rewriting and induction. 

Related works Some ABR conformance protocols have been automatically ve- 
rified in previous works. However all these protocols are approximations of Acr. 
In particular unlike our case they assume a bound on the number of rates to 
be scheduled. For instance Algorithm B’ of C. Rabadan [12] admits scheduling 
lists with only two elements. It has been proved recently in [10]. This proof is 
based on the calculus of weakest preconditions [6] (inductive invariants) and 
has been completely formalized with the COQ proof-assistant [2]. According to 
[10] the correctness proof of Algorithm B’ has been a key argument in the stan- 
dardization process of ABR. Several proof techniques have been experimented 
in the FORMA project (http ;//www-verimag. imag.fr) for the validation of 
ABR protocols. But the model checking approaches have been hindered by the 
numerical parameters of the algorithm. L. Fribourg has also obtained good re- 
sults with extended timed automata [7]. A successful proof of protocol B’ with 
the parameterized temporized automata of Hytech [8] has been reported in [5]. 

Layout of the paper We first describe the principle of ABR Conformance 
in Section 1. Then we introduce the algorithms Acr and Acrl for controlling 
the ABR Conformance in Section 2 and 3 respectively. The rest of the paper is 
devoted to the equivalence proof of these two algorithms. We first introduce some 
key properties in Section 4, then an overview of the proof in Section 5. Since 
the PVS proof is too complex to be presented in extenso we give a skeleton that 
follows closely the mechanical proof. We comment about the mechanical proof in 
Section 6. For the interested reader, the full PVS specification and proof scripts 
can be found at http://www.loria.fr/~stratula/abr. 
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1 ABR Conformance Control 

ATM (Asynchronous Transfer Mode) technology allows networks to transmit on 
the same media various applications whose needs are different in term of data- 
flow rate or quality of services. ATM is a connection-oriented technology since 
users should declare service requirements and traffic parameters to all interme- 
diates switches when initializing connections. They also may agree to control 
these parameters on demand. In order to guarantee QoS a traffic contract spe- 
cifying a traffic mode is negotiated when the connection is set up. Traffic mana- 
gement should ensure that users get their desired QoS although traffic demand 
is constantly varying. In other words traffic management should ensure that all 
contracts are met. 

In order to solve the critical issue of congestion control the effective rate of 
cells emitted by user applications is controlled by a conformance algorithm called 
GCRA (Generic Gontrol of Gell Rate Algorithm). Among the possible traffic 
modes, Gonstant Bit Rate (GBR) and Variable Bit Rate (VBR) were designed 
mainly for traffic like voice and video. The Available Bit Rate (ABR) service 
class is especially adapted for standard data traffic, where timing constraints 
are not tight. Target applications for ABR are email, WWW, file transfer and 
variable quality video and voice. The principle of ABR is to divide the available 
bandwidth^ fairly among active traffic sources so that the network should provide 
each user with the best rate that is compatible with the current traffic (best- 
effort service principle). In ABR connections the allowed cell rate (AGR) is 
determined by the network from load information and may vary during the 
same connection. The network informs periodically the user about the new rate 
he can apply by sending back to him Resource Management (RM) cells. Hence 
the source rate is dynamically adjusted according to the available resources of 
the network by a, feedback control loop (see Fig. 1). Since the allocated rate varies 
during a connection with ABR mode, the conformance control is performed by 
a dynamic GGRA (DGGRA). 



Several ABR conformance algorithms have been proposed to the normaliza- 
tion committees. Algorithm Acr [4] can be viewed as a reference for defining the 
control of the user data-flow in the case of ABR. Each time a data cell arrives 
from the ABR terminal into the control interface Algorithm Acr computes the 
rate that should be applied to this cell. The computational cost induced by Acr 
has been considered too high and there were several proposals to improve it. 

For instance it was noticed that the rate change is only determined by the 
departure of RM-cells leaving the control interface towards the ABR terminal 
(called backward RM-cells) and these RM-cells are much less frequent than data 
cells. Hence there is much improvement in scheduling rate changes in advance 
when receiving backward RM-cells in the interface. This is the motivation for 



1 left-over by CBR,VBR, e.g. 
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the so-called incremental algorithm Acrl which has been designed to maintain a 
list of planned rates. This list is updated each time the interface receives a new 
backward RM-cell. Also controlling the rates with the scheduling list of Acrl 
seems to be less expensive than computing the maximum of a list of rates with 
Acr. 



2 Data-Flow Control Definition with ABR (Algorithm 
Acr) 

We shall give the data-flow control definition called here Acr. It has been first 
introduced in ATM Forum by [4] and since then it has been considered as a 
reference for the other algorithms. 

The principle of the algorithm Acr running in the interface is to manage 
the list of backward RM-cells received from the network (Fig. 1) in order to 
determine the rate that has to be controlled at some instant. We shall associate 
with each RM-cell a couple (t, er) where t is the time when the backward RM-cell 
leaves the interface (where conformance control is performed) towards the ABR 
terminal and er is the new rate imposed by the network to the ABR terminal. 
For our discussion we identify a cell with its associated couple. By convention 
we call time (resp. rate) of c the first (resp. second) component of a cell c. The 
control device receives the new expected rate value before the ABR terminal. 
Hence due to the transmission delays in the networks, the control device should 
apply at time t a value received at time t — t where r represents a propagation 
delay equal to the time taken by an RM-cell to go from the interface to the 
ABR terminal and back to the interface. However the propagation times in the 
network may vary according to the traffic load. In order to take into account 
variations of these delays, the ITU-T has proposed that the rate to be controlled 
at time t is computed as the maximum among the rates received by the interface 
within a temporal window limited by t — T 2 and t — r^ and the rate received just 
before or at t — T 2 - The window parameters T2 , satisfy T2 > They have been 
negotiated during the establishment of the traffic contract. 
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More formally, let I = [(ti, eri), (^ 2 , er 2 ), . . . , {tn, er„)] be a list of RM-cells. 
The first cell of I is (ti,eri). The list I is time- decreasing {t.d. in short) if z < 
j ^ ti > tj, for all i G [l..n — 1] and j G [2..n]. In order to handle limit cases 
in the definitions below we shall define to = +oo. We denote by • (resp. @) the 
cons (resp. append) operator on lists. 

The rate to be controlled at time t w.r.t. the t.d. list I = 
[(ti, eri), (^ 2 , er 2 ), . . . , (t„, er„)] of backward RM-cells leaving the interface is: 

, , j MaxEr(Wind{l,t)) ii Wind{l , t) 9 , 

J\.CTyL^ tj = \ 

I 0 otherwise 

where 

Wind{l,t) = {(ti,eri) G l\ (t-T 2 <U <t- To) or {U < t - T 2 < ti-i)} 

MaxEr{s) = max{er\{t,er) G s} 

For instance, with this rate control policy the end user can benefit at time 
t-\-To from a rate increase received on a backward cell at time t by the interface, 
that is, as soon as possible. On the other hand a rate decrease will be taken into 
account only at time t T 2 , that is, as late as possible. Hence this is a policy 
in favour of the user and based on worst-case situations. It can be noticed that 
for a fixed list I, Acr{l, t) is decreasing on t after t-\-To since the window is (non 
strictly) decreasing for the inclusion relation. 



3 Incremental Conformance Checking (Algorithm Acrl) 



We now introduce an ideal incremental algorithm Acrl for conformance control. 
Unlike Acr the algorithm Acrl computes a list of rates to be controlled in the 
future. Hence it maintains a list Prog{l) of cells {I is as above) {tj, erj) containing 
a future rate erj to be controlled together with the time tj it comes into effect. 
Similar to I, Prog{l) will be sorted in decreasing order on time. This list is 
updated when receiving a backward RM-cell. An important gain over Acr is due 
to the fact that the RM-cells are much less frequent than the data cells. The 
ratio of RM-cells to all cells recommended by the ATM Forum^ is 1/32. Let 
us assume that the list Prog{l) is constructed. We shall prove later that it is 
time-decreasing as 1. Then the rate to be controlled at time t is: 

Acrl{l,t) = Prog{l)t 

where pt is a function that computes by extrapolation the rate at time t from a 
list of scheduled rates p. This function simply extracts from p the first rate value 
that is scheduled at or immediately before the time t. This rate is the one to be 
controlled at t. Formally: 



^ ATM Forum Traffic Management Specification Version 4.0. 
ftp://ftp.atmforum.eom/pub/approved-specs/af-tm-0056.000.ps 
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{ evi if there exists a cell {U, er^) € p such that t >ti and 

there is no cell {tj, evj) G p with t > tj, i > j and i,j G [l..n] 

0 otherwise 

We now describe how the list Prog{l) is updated. A list I' is a prefix of a list I if 
there exists a list I" such that I = l'@l" . Given a list I' and a time t we denote by 
I'y^t the maximal prefix of I' containing cells with time greater than t. Similarly 
we define the maximal prefix of I' containing cells with rate less or equal 

than er. This gives the following recursive definition of Prog{l): 



Prog{(f, er) ■ 1) = 



if er > Prog{l)t+T3 then {t + T3, er) ■ I' 
where Prog{l) = Prog{l)^^t+r3@l' ■ 
else if Prog{l)<^er is empty then {t + T 2 , er) ■ Prog{l) 
else (t', er) ■ I” 

where Prog{l) = Prog{l)<^er@i" 
and Prog{l)<^er = L@[{t',er')] 



Prog{l) is by definition the empty list when I is the empty list. 

Our goal in the remaining of the paper is to show that the incremental 
algorithm Acrl delivers the same rate values as the reference algorithm Acr, i.e. 



Vt VI Acrl{l,t) = Acr{l,t) 



4 Two Key Properties 

Two properties were used abundantly in the main proof. The first one time_dec 
states that Prog(l) is sorted in decreasing order on its time components. The 
second one rate_inc specifies a prefix of Prog{l) that is sorted in increasing 
order on its rate component. Both assume that I is time-decreasing (t.d.). We 
denote by Timel{l) the time of its first cell (0 if I is empty). 

Property 1 (time.dec) Given a t.d. cell list I the list Prog{l) is also t.d.. 

Proof. By induction on 1. When I is empty Prog{l) is empty hence t.d.. Assume 
by induction hypothesis that for any t.d. list li of length less or equal than 
the length of I, Prog{li) is t.d.. Let us prove that Prog{{t,er) ■ 1) is t.d. when 
t > Timel{l). We perform a case analysis guided by the definition of Prog from 
Section 3. Note that any sublist of a t.d. list is also t.d.. 

1. assume that er > Prog{l)t+T3- Let I' be such that Prog{l) = 
Prog{l)^^t+T 3 @l' ■ Then Prog{{t, er) ■ 1) is equal to (t -I- T3, er) • I' which is t.d. 
since t + T 3 > Timel{l') and I' is t.d.. 
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2. otherwise, let Prog{l) = Prog{l)<^er@l" ■ If Prog{l)<^er is empty then the 
list {t + T2,er) • Prog{l) is t.d. because t + T2 > Timel{Prog{l)). This can be 
deduced from (i) the time-decreasing property of (t, er) • I, and (ii) the fact that 
for any nonempty t.d. list L = (t, e) ■ L' we have t-|-r2 > Timel{Prog{L)). (This 
can be proved by a simple induction on L.) 

If Prog{l)<^er is a nonempty list of the form , er')] then (t',er) ■ I" is a 

t.d. list as it can be deduced from the time-decreasing property of Prog{l). □ 

We denote by Ir^t the maximal prefix I' of I such that the time of any cell 
from I' except possibly the last one is greater than t. We remark that for any 
nonempty t.d. list I, the prefix l^t and I have the same first cell. We say that 
a cell list I = [(ti, eri), (t2, er2), . • . , (tn, er„)] is strictly rate-increasing {s.r.i. in 
short) if the rate- values of its cells are strictly increasing: i < j ^ eri < erj, 
for all i,j G [l..n]. We can observe that given an s.r.i. list I and two time values 
t > t' we have k < h ■ 

Property 2 (rateJnc) Given a t.d. cell list I the prefix Prog{l)^Timei(l)JrT3 of 
Prog{l) is s.r.i.. 

Proof. By induction on 1 . If I is empty then Prog(l) is empty hence s.r.i.. Other- 
wise by induction hypothesis we assume that Prog{l)^Timei{i)-er3 is s.r.i.. If 
t > Timel{l) we will prove that Prog{{t,er) ■ l)r^t+T3 is also s.r.i. by case analy- 
sis according to the definition of Prog. 

1. if er > Prog{l)t+T3 Prog{{t,er) ■ 1 ) = (t-l-Ts, er) • /' where I' is such that 
Prog{l) = Prog{l)y^t-\-T3@l' ■ It results that Prog{{t,er) ■ l)^t-\-T3 = [{i + Tz,er)\ 
which is s.r.i.. 

2. otherwise let I" be such that Prog{l) = Prog{l)<^er@l''- 

2.1. if Prog{l)<^er is empty then Prog{{t, er) ■ 1 ) = (t-\- T2, er) ■ Prog{l). 

We have Prog{{t,er) ■ l)^t-\-T3 = {t X2,er) ■ Prog{l)rxt+T3- Moreover, 

Prog{l)n^t+T3 i® ^ prefix of Prog{l)^Timei(i)+T3 because t > Timel{l). By con- 
sequence {t T2, er) ■ Prog{l)rxt+T3 is s.r.i.. 

2.2. otherwise Prog{l)<^er = L@[{t',er')] for some L and Prog{{t,er) ■ 1 ) = 

ft' , er) ■ I". If I" is empty, Prog{{t, er) ■ l)r^t-\-T3 consists of the unique cell {t', er') 
and is obviously s.r.i. Since I" is a sublist of Prog{l) the list l'f^t+T3 i® ^ sublist 
of Prog{l)r-^t+T3- Therefore it is s.r.i.. Moreover er is less than the rate of the 
first cell of I" (or when I" is not empty. We conclude that Prog{{t,er) ■ 

l)n.t+T 3 = {f, or) ■ l'f,t+T 3 i® ®-r-i- ° 

5 The Main Proof 

We will employ the following notations. Assume that cc is a cell, Time{x) its 
time, Er{x) its rate and T2, ts the two window parameters. The model checking 
approaches with MEC [1] and UPPAAL [3] had to assign values to these para- 
meters in order to proceed. Here we only assume all over the proof that T2 > T3 
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without any further mention of this hypothesis. We introduce T 2 {x) to denote 
Time{x) + T 2 and T 3 (cc) to denote Time{x) + T 3 . The predicate resp. 

is true if I is t.d., resp. s.r.i.. We will omit the quantifier prefixes of the 
formulas since all the variables are considered as universally quantified. 

The proof of the conjecture Acrl{l,t) = Acr(l,t) is immediate if we prove 
the main lemma P{1), where: 

P{1) : =A Prog{l)t = MaxEr{Wind{l,t)) 

We apply an induction on the length of 1. When I is empty, the proof is imme- 
diate, by expanding the definitions of Prog, MaxEr and Wind. In the step case, 
we suppose P{1) and we try to prove P{a' ■ 1). More precisely, we should prove 
that Prog{a' ■ l)t = MaxEr {Wind{a' ■ l,t)) follows from S-^(a' ■ 1). From the 
hypothesis 5-^ (a' • 1) we deduce S-^{1) since I is a sublist of (a' • 1). Hence, we 
can assume that Prog{l)t = MaxEr{Wind{l, t)). The arguments for proving the 
step case are also based on the following property depending on the induction 
hypothesis: 



Property 3 The list Prog{l)r^T 3 (a ) is s.r.i.. 



Proof. FromProperty 2 wehave5--(0 ^ S<p{Prog{l)r^(^Timei(i)+T 3 ))- Together 
with the hypothesis S-^{1), we deduce S‘'e{Prog{l)r^(^Timei(i)+T 3 ))- We also have 
Time{a') > Timel{l) since 

and by consequence Prog{l)r^T 3 {a ) is a prefix of Prog{l)^(^Timei{i)+T 3 )- 
Hence Prog{l)r^T 3 (a ) is s.r.i.. □ 

By Tm we denote Prog{l)t. Then t > Tm and no cell from the list Prog{l) has 
its time in the interval {Tm, t\. Let la be the time of the first cell of Prog{a ■ /). 
We have the following facts about la . 

1 . la G {T 2 {t'),Ts{t') I t' is the time of an (o' • 1) cell}, as can be easily proved 
by induction on 1. 

2. la G [T 3 (a'), T 2 (a')]. Since a' ■ I is t.d., from the first fact it results that 
la < T 2 {a'). Moreover from the definition of Prog-, (i) if Er(a') < Acrl{l, T 3 (a')) 
then the prefix Prog{l)<:^Er{a ) contains only cells with rates < Er{a') which 
therefore occur at a time > T^{a'). If Prog{l)<^Er(a ) is empty, then la = T 2 {a'), 
otherwise la is the time of the last cell of it. By consequence, la > T 3 {a'). (ii) if 
Er{a') > Acrl{l,T^{a')) then la = T^{a'). 

The domain where la can take its value for each case is included in the 
hachured time interval of the corresponding figure. From the definition of Prog, 
it can be shown that the rate of the first cell of Prog{a' ■ 1) is Er{a'). Hence 
Prog{a' ■ 1) = {la ,Er{a')) ■ L” for some list L” . 
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We perform a case analysis according to the position of t w.r.t. the values 
T2 (o') and T3(a'): 

1 . If 12(0') < t (see Fig. 2 ) then Acrl(a' -l,t) = Acr{a' ■ I, t){= Er{a')) because 

• MaxEr{Wind{a' ■ l,t)) = Er(a') since Wind{a' -l,t) = K]. 

• Prog{a' ■ l)t = Er(a') because {la , Er{a')) is the first cell of Prog{a' ■ 1 ) and 

la < T2{a') < t. 




2 . If T3 (o') > t (see Fig. 3 ) then 

• a' is not member of Wind{a' ■ l,t)- Therefore, Acr{a' ■ l,t) = Acr{l,t) , and 

• Acrl{a' ■ l,t) = Acrl{l,t) because la G [T3(a'), T2(a')]. 



rate^ 












1 L 




Er(a ) 


L 




R ^ ‘ 






^ ^ 1 AcrlfZ, time) 






^ ^ ^ ^ fr 

Time(a)Tm t T^(a )I^ T' 2 (a) time 






Fig. 3. Ti{a) > t 



By the induction hypothesis Acrl{l,t) = Acr{l,t)- It results that Acrl{a' -l,t) = 
Acr{a' ■ l,t)- 

3 . If 12(0') > t > T^{a') we deduce that a' G Wind{a' ■ l,t), by definition of 

Wind. 
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3.1. If Wind{l,t) is empty then the list I is also empty. Otherwise, the first cell 
of I belongs to Wind{l,t). Then Prog{[a']) = [(ifr(a'), T 3 (o'))] since t > T^{a') 
and Wind{[a'],t) = [o']. Therefore Acrl{[a'],t) = Acr{[a'],t) = Er(a'). 

3.2. If Wind{l, t) is nonempty, let ER^ be the value of the maximal rate of the 
Wind{l,t) cells. According to the induction hypothesis Acr{l,t) = Acrl{l,t)- 
Thus the rate of the Prog{l) cell situated at equals ERm- 

In order to schedule Er(a'), we perform a case analysis according to the definition 
of Prog: 

3.2.1. If the condition 



Acrl(/, T 3 (o')) < Er(a') (CondA) 

is true (see Fig. 4), then la = Ts{a'). Moreover {la ,Er{a')) is the first cell of 
Prog{a' ■ 1). Then Acr\{a' ■ l,t) = Er(a') because T^{a') < t. 




Fig. 4. T 2 {a ) > t > Ti{a ) and Acrl{l, )) < Er{a ) 



It remains to prove that Acr{a' ■ l,t) also equals Er(a') or equivalently ERm < 
Er{a'). 

Again from T:}{a') < t together with the fact that the sublist Prog{l)^T 3 (a ) is 
s.r.i. according to Property 3 we deduce Acrl{l,t) < Acrl{l,T^{a')). Therefore 
by the condition Cond.l and the induction hypothesis we have ERm < Er{a'). 

3.2.2. If the condition 



Acrl(/, T 3 (a')) > Er{a') {Cond.2) 

is true we know that la > T^{a'). We distinguish the following cases: 

3.2.2. 1 Tm < T 3 {a') (see Fig. 5). Since T^{a') < t it results that t > T^{a') > Tm- 
By the definition of Tm, no cell of Prog{l) has its time in the interval {Tm, i\. 
Hence Acrl{l,T 3 {a')) = ERm- On the one hand by the condition Cond.2 we 
have ERm > Er(a'). Consequently Acr{a' ■ l,t) = ERm- On the other hand the 
instant R can be either T 2 {a') or the time of a Prog{l) cell. ? 2 (a') also cannot be 
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inside the interval (T^, t] since ? 2 (a') > t. If la = 12 ( 0 ') then > t. Assume 
now that la is the time of a Prog{l) cell. Since la > T 3 {a'), t > Ts{a') > Tm 
and there is no Prog{l) cell located in the interval (T™, t] we deduce that la > t. 
Therefore Acrl{a' ■ l,t) is also equal to ERm- 



/ 


\ 






rate 




\ , 


Acrl(a .Z,time) 


Er{a ) 






^ Acrl(/,time) 


■ ^ ^ 

Time{a ) T-m T^{a )t 

Fig. 5. T2{a ) > t > T3{a ), Acrl{l, Tz{a )) > Er{i 


^ =r 

T2(a ) time 

a ) and T™ < T3(a ) 



3. 2. 2. 2. Tm > T^{a') (see Fig. 6). It results that t > > T^{a'). We have the 

following cases to consider: 

-if Er(a') < ERm then Acr{a' ■ 1) = ERm- From Property 3 we deduce that 
la P Tm- la Can be either T 2 {a') or the time of a Prog{l) cell which is greater 
than t since there is no Prog{l) cell in the interval {Tm, i\. Hence R > t and 
we can conclude that Acrl{a' ■ l,t) = Acrl{l,t) = ERm- 

-if Er{a') > ERm then Acr{a' ■ 1) = Er{a')- Again from Property 3 we 
obtain R < Tm- Let Prog{l) = Prog{l)<^Er(a )@l" - Then {Tm,ERm) G 
Prog{l)< Er{a )- It results that Acrl{a' ■ l,t) = Acrl{a' ■ l,Tm) = {la ,Er{a')) ■ 
It = Er{a')- 



6 Comments on the Mechanical Proof 

The equivalence theorem has been successfully developed within PVS [11] en- 
vironment. PVS provides an expressive specification language that builds on 
classical typed higher-order logic with mechanisms for defining abstract dataty- 
pes. Hence the specification of the algorithms already given in functional style 
was relatively easy. In this work we deliberately restricted ourselves to first-order 
features of the language since our final goal is to prove the equivalence theorem 
with a first-order prover in a more automatic way, i.e. with less input from the 
user. 

The first difficult proof step we encountered was to show that if a list of 
cells I is time-decreasing then its associated scheduling list Prog{l) is also time- 
decreasing: this property is expressed by Property 1. We have tried to apply 
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induction on the length of the list but it failed because the scheduling list in the 
induction conclusion is not a sublist of the one in the induction hypothesis. The- 
refore, the direct application of the induction hypothesis was impossible. After a 
closer analysis, we discovered that auxiliary lemmas concerning time properties 
of cell lists were needed. However the initial functions were not sufficient to ex- 
press them. Hence an important decision was to enrich the initial specification 
with auxiliary functions such as Timel. 

The formal verification of time_dec follows closely the semi-formal proof from 
Section 5. By the rules ASSERT and GRIND, the theorem prover PVS applies 
its decision procedures to perform arithmetic reasoning, employs congruence 
closure for equality reasoning, installs theories and rewrite rules along with all 
the definitions relevant to the goal in order to prove the trivial cases, to simplify 
complex expressions and definitions, and to perform matching. In detail, the 
formal proof of time_dec requires 32 user-steps in PVS, 6 of which are ASSERT, 
5 are GRIND and 4 are LEMMA, which introduce instances of previously proved 
lemmas as new formulas. 

The proof of main_conj, which codes the main lemma, is the most complex 
that was elaborated for this specification. It follows the skeleton of the semi- 
formal proof described in Section 5. We have applied an INDUGT rule to perform 
induction on the length of the cell list. The basic case was completed by a 
GRIND operation. However, the proof of the step case has needed 27 invocations 
of lemmas and for 6 times the application of the GASE rule, to perform case 
reasoning. The analysis of each particular case was a source for the development 
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of new lemmas that in turn may require new lemmas for their proofs. The depth 
of the lemmas dependency graph is 7. 

The analysis of some particular cases presented in the proof of the main_conj 
lemma suggested other auxiliary functions to express additional properties. For 
instance, the auxiliary functions ListUpTo (1 ,t) and SortedE(l) denoted res- 
pectively and that have been introduced in Section 4 to formalize 

Property 2. Some of the cases follow very closely the corresponding cases from 
the informal proof, as 1, 2, 3.1 and 3.2.1. The cases 3.2.2. 1 and 3. 2. 2. 2 are more 
complex and have required 17 lemma invocations. The proof of main_conj con- 
sists of 120 user-guided steps, seven of which are GRIND operations and 29 
are ASSERT commands and indicate that the arithmetic and equality reasoning 
have been intensively used. 

The whole proof takes 78.97s on a PC featured with an Intel Pentium II 
processor at 333 MHz and 128 Mbytes of RAM memory. The effort to design 
the mechanical proof took about two months including the time to get familiar 
with PVS. 



7 Conclusions and Perspectives 



Our objective was to derive the first mechanical proof of an ideal incremental 
ABR conformance algorithm. A proof by hand of the algorithm has been desi- 
gned before [13]. However this kind of manual proofs is not fully convincing in 
general since limit cases or apparently trivial arguments are often omitted and 
they may be source of mistakes. In this respect our machine proof gives more 
evidence of the correctness of the algorithm since every single step has been 
verified by PVS. 

On the one hand, the specification of the algorithm as a recursive function 
in PVS was relatively easy. The proof we obtained on the other hand was rather 
involved. It has required about 80 intermediate lemmas and the introduction 
of auxiliary definitions. We feel that there is space for optimization and many 
inference steps can possibly be simplified. It is also our plan to search for a 
proof with more automated tools. We expect to derive entirely automatic proofs 
for many lemmas. In another direction we also think about using higher-order 
specification and reasoning techniques to derive a proof that is more synthetic 
and therefore easier to grasp. Finally we should prove that by limiting the size 
of the scheduling lists the ideal ABR conformance algorithm indeed reduces to 
approximate algorithms such as B’. 

Acknowledgements: We thank Laurent Fribourg, Bernhard Gramlich and 
Jean-Frangois Monin for interesting discussions and remarks about this work. 
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Abstract. The verification of continuous-time Markov chains (CTMCs) 
against continuous stochastic logic (CSL) [3,6], a stochastic branching- 
time temporal logic, is considered. CSL facilitates among others the spe- 
cification of steady-state properties and the specification of probabilistic 
timing properties of the form $ 2 ), for state formulas and 

$ 2 , comparison operator ixi, probability p, and real interval I. The main 
result of this paper is that model checking probabilistic timing properties 
can be reduced to the problem of computing transient state probabilities 
for GTMGs. This allows us to verify such properties by using efficient 
techniques for transient analysis of GTMGs such as uniformisation. A 
second result is that a variant of ordinary lumping equivalence (i.e., bi- 
simulation), a well-known notion for aggregating GTMGs, preserves the 
validity of all CSL-formulas. 



1 Introduction 

Continuous-time Markov chains (CTMCs) have been widely used to determine 
important system performance and reliability characteristics. To mention just 
a few applications, these models have been used to quantify the throughput of 
production lines, to determine the mean time between failure in safety-critical 
systems, or to identify bottlenecks in communication networks. Due to the ra- 
pidly increasing size and complexity of systems, obtaining such models in a 
direct way becomes more and more cumbersome and error-prone. An effective 
solution to this problem is to generate CTMCs from higher-level specifications, 
like queueing networks, stochastic Petri nets [1], or stochastic process algebras 
[20,24]. 

Although these approaches have shown to be rather valuable — several (in- 
dustrial) case studies have been carried out and mature tool-support is available 
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[19,21] — the specification of the measure of interest is mostly done informally. 
The analysis of the CTMC most often boils down to the determination of steady- 
state and transient state probabilities. Steady-state probabilities refer to the sy- 
stem behaviour on the “long run” while the transient probabilities consider the 
system at a fixed time instant t. 

In [3] measures of interest of CTMCs are specified in a branching-time logic 
CSL {continuous stochastic logic) that includes a TCTL-like time-bounded until 
operator , where I is a time-interval, and a probabilistic operator Pixip(’) to 
reason about the probabilities of timing properties. As in the logic PCTL [17], 
a probabilistic variant of CTL interpreted over DTMCs, the operator Pc<p{ip) 
replaces the usual CTL path quantifiers V and 3 and refers to the probability 
for the event specified by the path formula p. The subscript ixi p (where ixi 
is a comparison operator and p G [0,1]) specifies a lower or upper bound for 
the “allowed” probabilities. The combination of the probabilistic operator with 
the temporal operator (which can be derived from the time-bounded until 
operator) analyses the quantitative behaviour at time instant t and can be used 
to reason about transient probabilities. For instance, error) asserts 

that the probability for a system error at time instant 4 is at most 10“^. 

In [6], CSL was extended by the usual next step and until operator and 
by a novel steady-state operator, e.g. the formula S^o. 98 {up) asserts that the 
steady-state probability for the system “being up” is at least 0.98. Moreover, 
[6] presented a model checking algorithm for the extended version of CSL that 
uses a variant of multi-terminal BDDs [12,4]; thus, obtaining a single framework 
that combines the traditional approach of steady-state and transient analysis of 
CTMCs with the symbolic BDD-based model checking approach for temporal 
logics. 

While [6] focuses on model checking with techniques that have been proven to 
be very efficient for non-stochastic systems, viz. BDDs, in this paper we investi- 
gate the complementary question and present a CSL model checking algorithm 
that operates with well-understood efficient techniques for analyzing CTMCs, 
namely transient analysis of CTMCs represented by sparse matrices. The main 
difficulty is the treatment of 'P^p{p) applied to a path formula {p of the form 
<p 2 -^ Our main result states that, for a given CTMC Ai and state s in 
M, the measure Prob~^{s, tp) for the event that ip holds when the system starts 
in state s, can be calculated by means of a transient analysis of the CTMC 
M', which can easily be derived from A4. This allows us to adopt efficient tech- 
niques for performing transient analysis of CTMCs, like uniformisation [15,16, 
25,27], for model checking probabilistic timing properties. In addition, we show 
that (ordinary) lumping-equivalence — a notion on Markov chains to aggregate 
state spaces [10,24] that can be viewed as a continuous variant of probabilistic 
bisimulation [26] — preserves the validity of all CSL-formulas. This allows us 
to switch from the original state space to the (possibly much smaller) quotient 

^ The steady-state operator and the probabilistic operator applied to next step or (un- 
bounded) until require essentially matrix operations like multiplication and solving 
linear equation systems that can be treated by standard tools for sparse matrices. 
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space under lumping equivalence. Using this property, we indicate how the state 
space for checking probabilistic timing properties on the derived CTMC Af' can 
be obtained. 

Organisation of the paper. Section 2 introduces CTMCs and CSL. Section 3 
presents a reduction of the model checking problem for time-bounded until to 
a transient analysis on CTMCs. Section 4 discusses lumping equivalence and 
preservation of CSL-properties. Section 5 reports on the checking of properties 
on a large plain-old telephone system [21]. Section 6 concludes the paper. 



2 CTMCs and CSL 



In this section, we briefly recall the basic concepts of CTMCs [28] and the logic 
CSL [3,6]. We slightly depart from the standard notations for CTMCs and 
consider a CTMC as an ordinary transition system (Kripke structure) where the 
edges are equipped with probabilistic timing information. Let AP be a fixed, 
finite set of atomic propositions. 

CTMCs. A (labelled) CTMC Af is a tuple (S', R, L) where S is a finite set 
of states, R : S X S — >■ IR^o the rate matrix^, and L : S ^ 2^^ the labelling 
function which assigns to each state s G S the set L{s) of atomic propositions 
a € AP that are valid in s. A state s is called absorbing iff R(s, s') = 0 for all 
states s'. We assume that for any state s, AP contains an atomic proposition 
which is characteristic for s, i.e., as € L(s) and Og ^ L(s') for any s' s. 

Intuitively, R(s, s') > 0 iff there is a transition from s to s'; 1 — '>'* is 

the probability that the transition s — >■ s' can be triggered within t time units. 
Thus, the delay of transition s — >■ s' is governed by an exponential distribution 
with rate R(s,s'). If R(s, s') > 0 for more than one state s', a competition 
between the transitions originating in s exists, known as the race condition. The 
probability to move from non-absorbing state s to a particular state s' within t 
time units, i.e., s ^ s' wins the race, is given by 



P(s,s',t) 



a _ -E(s).i 

E(s) I 



where E(s) = J2s'sS R(s,s') denotes the total rate at which any transition em- 
anating from state s is taken. More precisely, E(s) specifies that the probability 
of leaving s within t time-units is 1 — due to the fact that the minimum 

of exponential distributions (competing in a race) is characterised by the sum of 
their rates. Consequently, the probability of moving from a non-absorbing state 
s to s' by a single transition, denoted P(s, s'), is determined by the probability 
that the delay of going from s to s' finishes before the delays of other outgoing 

^ We do not set R(s,s) = R('S,s'), as is usual for CTMCs. In our setting, 

self-loops at a state s are possible and can be modelled by R(s, s) > 0. The inclusion 
of self-loops does neither alter the transient nor the steady-state behaviour of the 
CTMC, but allows the usual interpretation of linear-time temporal operators like 
next step and unbounded or time-bounded until. 
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edges from s; formally, P(s, s') = R(s, s')/E(s). For an absorbing state s, the 
total rate E(s) is 0. (In this case, we have P(s, s') = 0 for any state s'.) 

The initial state probabilities of Ad = (S', R, L) are given by an initial distri- 
bution a : S' — >■ [0, 1] with X^seS have a unique initial state 

s, the initial distribution is denoted al, where ctg(s) = 1 and o;^(s') = 0 for any 
s' yf s. 



Example 1. As a running example we address a triple modular redundant system 
(TMR) taken from [18], a fault-tolerant computer system consisting of three 
processors and a single (majority) voter that we model by a CTMC where state 
Sij models that i processors and j voters are operational. Initially all components 
are functioning correctly (i.e., a = J. The failure rate of a processor is A and 
of the voter n failures per hour (fph). The expected repair time of a processor 
is l//i and of the voter 1/6 hours. The system is operational if at least two 
processors and the voter are functioning correctly. If the voter fails, the entire 
system is assumed to have failed, and after a repair (with rate 5) the system is 
assumed to start “as good as new” . The details of the CTMC are: 



UP3 



3A 



UP2 




R = 



/O 3A 0 0 
pL 0 2X 0 1/ 

0 fi 0 X 

0 0 p, 0 n 

0 0 0 0 / 



We have e.g., P(s 2 ,i,S 3 .i) = fJ-/{fJ--\-2X-\-v) and P(so,i, 



and E = 



^ 3A-I-^' ^ 
2X-\-pt-\-h' 

pL-\-v 

V ^ / 



So.o) = l^/{lJ-+J^)- ■ 



Paths. A path cr is a finite or infinite sequence Si,ti, S 2 ,t 2 , ■ ■ ■ with, for 

i G IN, Si € S and U G IR>o such that R(si,Si+i) > 0, if cr is infinite. For an 
infinite path a and i G IN let a[i] = Si, the f-th state of a, and 6(a,i) = U, the 
time spent in Si. For t G and i the smallest index i with t ^ X}=o 
u@t = a[i], the state of a at time t. If a is finite and ends in si, we require that 
Si is absorbing, and R(si,Si+i) > 0 for all i < 1. For finite cr, a[i] and 6{a,i) 
are defined for z ^ ^ in the above way, whereas 6{a, 1) = oo, and cr@t = si for 
t > Let Path denote the set of paths in M, Path(s) the set of paths 

starting in s. 

Borel space. An initial distribution a yields a probability measure Prc, on 
paths as follows. Let sq,... ,Sk G S with R(si,Si+i) > 0, (0 ^ z < k), and 
Jo,... ,/fc-i non-empty intervals in IR^o- Then, C{sq, Iq, . . . ,Ik-i,Sk) denotes 
the cylinder set consisting of all paths a G Path(so) such that cr[z] = Si (i ^ k), 
and S(a,i) G li (i < k). Let iF(Path) be the smallest cr-algebra on Path which 
contains all sets C{s,Iq,... ,Ik-i,Sk) where sq,... , Sfe ranges over all state- 
sequences with s = So, R(si, Si+i) > 0 (0 ^ z < fc), and Iq, ■ ■ ■ , Ik-i ranges over 
all sequences of non-empty intervals in IR^o- The probability measure Pro, on 
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T{Path) is the unique measure defined by induction on k by Pra(C(so)) = o;(so) 
and for fc ^ 0: 

Pr(C(so, . . . , Sfc, I', s') = Pr(C(so, • ■ • , Sfe)) • P(sfe, s') ■ 

where a = inf /' and b = sup (For b = oo and A > 0 let = 0.) 

Remark 1. For an infinite path a = Sq: Si, ti, . . . we do not assume time 

divergence. Although might converge, in which case a represents an 

“unrealistic” computation where infinitely many transitions are taken in a finite 
amount of time, the probability measure of such non-time-divergent paths is 0 
(independent of a). This allows a lazy treatment of the notation a@t in the 
description of measurable sets of paths for which we just refer to the probability 
measure. ■ 



Steady state and transient probabilities. For a CTMC two major types of 
state probabilities are distinguished: steady-state probabilities where the system 
is considered “on the long run” i.e., when an equilibrium has been reached, and 
transient probabilities where the system is considered at a given time instant t. 
Formally, the transient probability 

7T^(a,s',t) = Pr{a G Path \ a@t = s'} 

a 

stands for the probability to be in state s' at time t given the initial distribution 
a. Steady-state probabilities are defined as 7r-^(a, s') = limt_>oo 7r-^(a, s',f). 
This limit always exists for finite CTMCs. For S' C S, let 7r-^(a, S") = X^s'gS' 
7r^(a,s') denote the steady-state probability for S' given a, i.e., 

7T^(a, S') = lim Pr{ cr G Path \ <7@t G S' }. 

t—¥oo a 

We let TT^{a, 0) = 0. We often omit the superscript M if the CTMC M is clear 
from the context. In case of a unique initial state s, i.e., a = a}, we write Pr^ 
for Pr^, 7 t(s, s',t) for 7r(a, s',t), and 7 t(s, s') for 7r(a, s'). 

Syntax of CSL. CSL is a branching-time temporal logic where the state- 
formulas are interpreted over states of a CTMC [3,6]. As in [6] we consider the 
extension of CSL of [3] with 5><ip(-) to reason about steady-state probabilities. 
We generalise CSL as defined in [6] with a time-bounded until operator that 
is parametrized by an arbitrary time-interval I. Let a G AP, p G [0, 1] and 
CXI G { ^, ^ }. The state-formulas of CSL are defined by: 



<P ::= tt 



a 



(p A<P 



-n<P 






'P 



where, for interval I C IR^Oi path- formulas are defined by: 



ip ::= XP 



<PU<P 



<PU^ P. 
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The other boolean connectives are derived in the usual way, i.e. ff = -■tt, 
V ^2 = A -•'P 2 ), and — >■ ^2 = V i? 2 - The meaning of 14 (“un- 

til”) and X (“next step”) is standard. The temporal operator 14^ is the ti- 
med variant of 14 ; <4>i <p 2 asserts that <p 2 will be satisfied at some time 

instant in the interval I and that at all preceding time instants holds. 
‘5[xip(^) asserts that the steady-state probability for a <?-state falls in the in- 
terval Imp = { <7 G [0, 1] I q p}. Pc^p{(f) asserts that the probability measure 
of the paths satisfying (p meets the bound given by ixi p. 

Temporal operators like <>, □ and their real-time variants or can be 
derived, e.g. <P) = V^p{ttl4^ <P) and P^p(D^) = V^\-p{0 

Example 2. Let AP = { up^ |0<i<4}U{ down } and consider the CTMC of 
Example 1. P^io-s down) denotes that the probability of a failure of the 

voter within the next 10 hours is at most 10“®; the formula S^o,gg{up^ V UP 2 ) 
asserts that with 0.99 probability the system is operational, when the system is 
in equilibrium. ■ 

Semantics of CSL. The state-formulas are interpreted over the states of a 
CTMC. Let M = (S', R, L) with labels in AP. The definition of the satisfaction 
relation ^ C S x CSL is as follows. Let Sat{<P) = {sGS|s^<?}. 

s ^ tt for all s G S s ^ A <?2 iff s H 2 

s ^ a iff a G L{s) s ^ S^p(^) iff 7t(s, Sat{<P)) G /ixp 

sh“’^iffs^^ Vp^pip) iff Prob^{s, tp) G /,xip- 

Here, Prob^ {s,p) denotes the probability measure of all paths cr G Path satis- 
fying p when the system starts in state s, i.e., Prob'^{s, p) = Pr^j cr G Path \ 
a \= p}.^ The satisfaction relation for the path-formulas is defined as: 

a ^ X(J> iff cr[l] is defined and ct[1] ^ <P 
a \= <PiU <p 2 iff ^ 0. (cr[fc] \= <p 2 4\V0 ^ i < k. a[i\ \= <?i) 
a \=<hiU^ <4>2 iff 3t G I. {a@t (= ^2 A Vu G [0, t[. a@u [= <Pi) 

We note that for / = 0 the formula <Pi 14^ ^2 is not satisfiable and that 14 <p 2 
can be interpreted as an abbreviation of <? 2 - 

Remark 2. Although CSL does not contain an explicit transient state ope- 
rator, it is possible to reason about transient state probabilities as we have: 
7t(s, s',t) = Prob{s,0^*’*^as'). Thus, whereas the steady-state operator 5x,p(<?) 
cannot be derived from the other operators, a transient-state operator {<l>) = 
P>^p( 0 [‘>*]<^) can be defined. It states that the probability for a ^-state at time 
point t meets the bound cxi p. ■ 

3 Model Checking tl^ by Transient Analysis 

In [6], we presented a CSL model checking algorithm that essentially relies on the 
following ideas. The steady-state operator requires the computation of steady- 
state probabilities which can be obtained by a graph analysis and by solving 
® The fact that the set { u G Path | u |= } is measurable can be easily verified. 
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a linear equation system. The basis for calculating the probabilities Prob{s, tp) 
are the following results. For the temporal operators next X and until U (that 
abstract from the amount of time spent in states but just refer to the states that 
are passed in an execution), we have the same characterizations for the values 
Prob{s, X<P) and Prob{s,<PiU <p 2 ) as in the case of DTMCs [17]: 

Prob{s, X<1>) = and 

r 1 if s 1= 

Prob{s, U <1>2) = \ Z^s'gS ■ Prob{s' , <Pi U <^ 2 ) if s |= A -'<^2 

[ 0 otherwise. 

This amounts to matrix/vector-multiplication for next and solving a linear equa- 
tion system for until. 

For the time-bounded until operator, [6] suggested an iterative method which 
relies on the observation that the function (s, t, t') 1 — Prob{s, <Pi 1 ^ 2 ) can be 
characterized as the least fixed point of a higher-order operator f2 where Pi{F) 
is defined by means of Volterra integrals. This fixed-point characterization then 
serves as a basis for an iterative method that uses numerical integration techni- 
ques. First experiments in a non-symbolic setting have shown that this approach 
can be rather time-consuming and that numerical stability is hard to achieve [22] . 
[6] suggested a symbolic approach by combining (MT)BDD-techniques [9,12,4] 
with an operator for solving integrals by quadrature formulas. Here, we propose 
an alternative strategy that reduces the model checking problem for the time- 
bounded until operator to the problem of calculating transient probabilities in 
CTMCs. This observation allows us to implement CSL model checking on the 
basis of well-established transient analysis techniques for CTMCs (see below) . 

Four correctness-preserving transformations. We first observe that it suf- 
fices to consider time bounds specified by compact intervals, since: 

Prob{s, <Pi (P 2 ) = Prob{s, ^ 2 ) 

where cl{I) denotes the closure of I. Secondly, unbounded time intervals [t, 00 ) 
can be treated by combining time-bounded until and unbounded until, since: 

Prob{s,<PiU^*’°°^ (P 2 ) = ^ Prob{s,<PiU^*’*^ Qs') ■ Prob{s' ,<PiU ^> 2 ). 

s'es 

In the sequel, we treat 4 types of time-bounded until-formulas with a compact 
interval / and show how they all can be reduced to instances of two simple base 
cases. For CTMC A4 = (S', R, T) and CSL-state formula <P let CTMC A4[^] 
result from M by making all (^-states in M absorbing; i.e., M.[<F\ = (S, R',L) 
where R'(s, s') = R(s, s') if s [A and 0 otherwise. Note that AI[<?i][<? 2 ] = 
M[<Pi V ^ 2 ]- 

Case A: Bounded until for absorbing d> 2 ~states. Let ip = <4>\ ^2 and assume 

that all ^ 2 -states are absorbing, i.e., once a <? 2 -state is reached it will not be 
left anymore. We first observe that once a (-'^1 A -'<? 2 )-state is reached, ip will 
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be invalid, regardless of the future evolution of the system. As a result, we may 
switch from A 4 to A -'<^2] and consider the property on the obtained 

CTMC. The assumption that all <?2-states are absorbing allows us to conclude 
that (p is satisfied once a <?2-state is reached at time t. Thus, 

Lemma 1. If all <p2~states are absorbing in Ai (i.e., Ai = Ai['I>2]) then: 

Pro6^ (s, <l>i <l>2) = Pro6^'(s, 0[*’*1^>2) = ^ 7r^'(s,s",t) 

s"h<t>2 



for Ai' = Ai A ~'^2\ ■ 

Case B: Point-interval until for <p2 — >■ Let (p = <?2 and assume 

<p2 — ^1- Note that such implication holds in case of O-properties. With the 
same motivation as for the previous case, we make (-'^1 A -'<?2)-states absorbing. 
Since I>2 ^1 h follows that Prob{s, (p) equals the probability to be in a ^2- 

state at time t in the obtained CTMC: 

Lemma 2. If <p2 we have for any CTMC Ai: 

Pro6^(s,<?iW[‘’‘l<l>2) = Pro6^'(s,0[‘>‘l<l>2) = ^ 7r^'(s, s", t). 

S"h<f2 



for Ai' = Ai [“'^1 A “'^2] • 

Case C: Bounded until. Let (p = <p2 and consider an arbitrary CTMC 

Ai. This property is fulfilled if a <?2-state is reached before (or at) time t via 
some ^i-path. Once such ^2-state has been reached, the future behaviour of 
the CTMC is irrelevant for the validity of (p. Accordingly, the <?2-states can be 
safely made absorbing without affecting the validity of (p. As a result, it suffices 
to consider the probability of being in a ^2-state at time t for Ad [^2], thus 
reducing to the case in Lemma 1. As Ai[^2][~'^i A ~'<I>2] = Ai[~'<Ii V <^2] we 
obtain: 

Theorem 1. For any CTMC Ai: 

F2) = ^>2) 

s "|=#2 

Case D: Interval-until. Let tp = F 2 with 0 < t ^ t' and let Ai be 

an arbitrary CTMC^. We first observe that for any path a with a 1 = (p: (i) 
continuously holds in the interval [0, t](i.e., u |= in particular, s' = a@t 

is a <?i-state, and (ii) a' G Path(s'), the suffix of cr that starts at time t, fulfills 
the path formula <I>ihl^^’* Let the intermediate state s' € Sat(^i) and 

consider the set L'(s') of paths cr G Path(s) where a@t = s' and a \= pi. Then 

Note that Pro6(s, <fi FT) A Prob{s, Fi F 2 ) - Prob{s, Fi F 2 ). 

® Formally, a' is the unique path with a'@x = <T@(t-|-a:) for any positive real x. 
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Prs(I7(s')) equals Pro6(s, as>) times Prob{s' *1 <l> 2 )- As the sets 

A(s') for s' € Sat(^i) are pairwise disjoint we obtain: 

Prob{s,(l>iU^*’'^ (P 2 ) = ^ Pro6(s, ttg/) • Pro6(s', ^ 2 )- 

s'|=«gi 

To compute the probabilities Pro6(s, a^') for s' \= <Pi we use Lemma 2, 

i.e., we switch from to Af 1 = Af and compute the transient probabilities 
for any <?i-state s' at time t in Ali. The probabilities Prob{s' ^ 2 ) 
can be obtained as in Theorem 1. This yields the following result: 

Theorem 2. For any CTMC M and 0 < t ^t' : 

Prob^{s,^iU^*’''^^2)= Y. 7T^^^‘^^Ks,s',t)-Tr^^-'^^^^^\s',s",t'-t). 

s'|=^l s "\^4>2 



With Theorem 2 we can calculate the values Prob'^ {s, tp), using one of the 
following two methods. Let A ^2 = A^[~'^i V^ 2 ]- Either we calculate the matrices 

A (tt ^ (s, s , t))se5,s^eSat(<Pi ) 5 
B = (7T^^(s',s",t'-t))s'eSat(<Si),s"eSat(<i>2) 

and then take the product A • B. Or, we first calculate the distributions as 
for s € S, given by as(sO = s' ,t) and then compute Prob"^ {s^p) = 

7T^^(as, s", t'— <). This alternative is based on the observation 

'Y Ots{s) ■ TT-^^ (s', s", t'-t) = (Os, s", t' ~t) 

a'h'fi 

for any <? 2 -state s".® 

Example 3. Consider our TMR with initial distribution a = a].^ ^ and let <F = 
^> 0 . 15 (^ 1 ^^^’’^' ^ 2 ) for = upg V up 2 and F 2 = up 2 V upi . According to Theo- 
rem 2 model checking <P boils down to first computing the transient probabilities 
at time 3, i.e., «2 = (sap, 3))sgs in CTMC M.\ of Fig. 1(a) where all - 1 ^ 1 - 
states are made absorbing. We obtain 0:2 = (0.968, 0.0272, 0.011, 0, 0.003) with a 
precision of e = 10“® for A = 0.01, v = 0.001, /r = 1.0 and <5 = 0.2. In the second 
phase, we compute the transient probabilities at time 4 in CTMC AI 2 of Fig. 1(b) 
starting from initial distribution 02 ) i-e., computing huP2’"^'(°'2,s",4) « 
0.1365. Thus, the property ^ is violated. 

Uniformisation. Based on the general principle of uniformisation [25], efficient 
techniques to compute transient state probabilities for CTMCs have been pro- 
posed [16,15]. With uniformisation, the transient probabilities of a CTMC are 

® For both alternatives, in the worst case, for any state s, we need a transient analysis 
in All (with initial state s) and AI2 (with initial distribution a^). The second 
alternative might be preferable if there are only a few different distributions as- 




Model Checking Continuous-Time Markov Chains by Transient Analysis 367 




Fig. 1. CTMCs to be analysed for checking V upj) (upj V up^)) 



computed via a so-called uniformised DTMC which characterises the CTMC at 
state transition epochs. 

Denoting with 7r(a, t) the vector of state probabilities at time t, i.e., 7r(c«, t) = 
(7r(a, si, t), • • • , 7r(a, sat, t)) (with N = jS”! the number of states), the Chapman- 
Kolmogorov differential equations characterise the transient behaviour: 7r'(o;,t) 
= 7 L{oi,t) ■ Q, where Q R — diag{^). A formal solution is then given by the 
Taylor-series expansion: 



7r(a,t) = a.eQ-‘ = a.V^^. 

i=0 *■ 

This solution, however, should not be used as the basis for a numerical algorithm 
since: (i) it suffers from numerical instability due to the fact that Q contains 
both positive and negative entries; (ii) the matrix powers will become less and 
less sparse for large i, thus requiring 0{N'^) storage; (iii) it is difficult to find a 
proper truncation criterion for the infinite summation. 

Instead, by choosing q = maxi{E(sj)}, we construct the uniformised DTMC 
with transition probability matrix P = I -|- Q/q. By the choice of q, P is a 
stochastic matrix. Substituting Q = ^(P — I) in the above solution, we obtain 

i=o *■ 



which can be rewritten as 



OO 

K{a,t) = ^PP{i)-TPi, 

where PP{i) = e~^'* is the i-th Poisson probability with parameter qt, and 
7Tj = Ti-iP and ttq = a. The Poisson probabilities can be computed in a stable 
way with the algorithm of Fox and Glynn [14] . There is no need to compute ex- 
plicit powers of the matrix P. Furthermore, since the terms in the summation are 
all between 0 and 1, the number of terms to be taken given a required accuracy. 
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can be computed a priori. For large values of qt, this number is of order 0{qt). 
Notice, however, that for large values of qt, the DTMC described by P might 
even have reached steady-state, so that a further reduction in computational 
complexity is reached. For further details, see [28,18]. 

Regarding storage complexity, we note that we require 0{5N) storage for the 
probability vectors and 0{rjN) for the matrix P, where 77 denotes the (average) 
number of transitions originating from a single state in the DTMC (typically 
•q « N). Regarding computational complexity, to compute Tr{a,t) we require 
the sum of 0{qt) vectors, each of which is the result of a matrix-vector multi- 
plication. Given a sparse implementation of the latter, we require 0{qN) scalar 
multiplications for that, so that we have an overall computational complexity of 
0{qt-qN). 

4 Abstraction with Bisimulation (Lumping) Equivalence 

In this section, we discuss some techniques to reduce the state space of a CTMC. 
These techniques are mainly based on the observation that (a slight variant 
of) ordinary lumping equivalence (i.e., bisimulation) preserves all CSL-formulas. 
This result is in the spirit of [8] where bisimilar states of an ordinary transition 
system are shown to satisfy the same CTL-formulas. Similar results have been 
established for many types of transition systems and branching-time logics; e.g., 
in the probabilistic setting, [2] shows that probabilistic bisimulation on DTMCs 
preserves PCTL [17]. Our result below can be considered as the continuous 
version of that result. Let M = {S, R, L) be a CTMC, F a set of CSL-formulas, 
and Lp : S ^ 2^ a, labelling defined by Lp{s) = {<P € F \ s \= <P}. 

Definition 1. An F'-bisimulation on A4 = (S', R, L) is an equivalence R on S 
such that whenever {s,s') € R then Lp{s) = Lp{s') and R(s, C) = R(s',C) for 
all C € S/R. States s and s' are F'-bisimilar iff there exists an F -bisimulation 
R that contains {s,s'). 

Here, S/R denotes the quotient space and R(s, C) abbreviates X^s'gC R(s,s'). 
F-bisimulation is a slight variant of Markovian bisimulation (which is defined 
on CTMCs with action-labelled transitions) on CTMCs with labelled states. 
Markovian bisimulation coincides with (ordinary) lumping equivalence [11], a 
well-known notion to aggregate CTMCs. 

For s £ S, let [s]i^ denote the equivalence class of s under R. For M = 
(S, R, L) we define the CTMC M/R = (S/i?, Rfl,Lfl) with Rfl([s]fl,C) = 
R(s, C) and L/j([s]/{) = Lp{s). That is, M/R results from M by building the 
quotient space under R and labelling states with F (rather than AP). M/R 
can be computed by a modified version of the partition refinement algorithm 
for ordinary bisimulation without an increase of the worst case complexity [23] . 
Let CSL/^’ denote the smallest set of CSL-formulas that includes F and that is 
closed under all CSL-operators. In the following we write \=m for the satisfaction 
relation \= (on CSL) on M. 
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Theorem 3 . Let R he an F -bisimulation on A 4 and s a state in A 4 . Then: 

(a) For all CSh p -formulas <P: s \=j^ <P iff [s]/j \=MjR ^ 

(h) For all CSLip path-formulas (p: Prob'^ {s, ip) = (/?) . 

In particular, F-bisimilar states satisfy the same CSLp formulas. 

Proof. Straightforward by structural induction on <I> and ip. 

Theorem 3 allows to verify CSL-formulas on the possibly much smaller Ai/R 
rather than on M, for AP-bisimulation R. 

In addition, we can exploit the above result to our transformations of the 
previous section by using the following observation. ^From Theorem 3 (b) and 
Remark 2 it follows: 

^ n^{s,s',t) = (1) 



for any formula and P-bisimulation R. This observation allows us to 

simplify the CTMCs Ai\. . .] that occur in the cases A-D of our model checking 
procedure presented in the previous section in the following way. For cases C 
and D, we compute the transient probabilities for <?2-states in the CTMC Ai' = 
Ai[-^d>i V <?2]- Let F = { A ~'<p2, } and R be the smallest equivalence on 

the state space S of Ai' that identifies all <?2-states and all {-•d>i A -'<?2)-states. 
Clearly, R is an F-bisimulation on Ai' . The state space of At' /R is 

S/R = SBt(d>i A “'^’2) U [Sat(^*2)]ij U A ~'d^ 2 )]R 

Since <p2 is a CSL/^’-formula, equation ( 1 ) yields 

^ 7 T^'(s,s",t) =TT^'^^{s,[Sat{<p2)]R,t) 
s"|=<p2 

for any state s G Sat(<?i A -■^2)- Similar arguments are applicable to case A 
and B. As a result, the sets A -•< 1 ’ 2 )]r and [Sat(<?2)]fi in cases A~D 

can be considered as single states. This may yield a substantial reduction of the 
state space of the CTMC under consideration. ^From a computational point of 
view, the switch from Ai to the modified Ai\. . .]/R is quite simple as we just 
collapse certain states into a single absorbing state. The generator matrix R/j 
for Ai[. . . ]/R can be obtained by simple manipulations of the generator matrix 
R for Ai (matrix multiplication). 

Example 4 - According to the above observations, in the CTMC of Fig. 1 (a) we 
may aggregate states [Sat(^i)]fl = { so.i, so,0) si,i } into a single state. This new 
state is reachable from S34 with rate r and from S2,i with rate 2 X+i'. In the 
CTMC of Fig. 1 (b) we may collapse [Sat(<^2)]i? = {s2,i)Si,i } and [Sat(-'<?i A 
“■^2)]fi = { so,Oj so,i } into single states. 
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5 Model Checking a Telephone System 

In this section, we report on model checking the stochastic behaviour of an in- 
stance of the plain-old telephone system (POTS), where two users concurrently 
try to get connected to each other. In [21], we have shown how a formal spe- 
cification of the POTS (in LOTOS) can be augmented with stochastic timing 
constraints, leading to a model of more than 10^ states. We aggregated this mo- 
del compositionally using appropriate stochastic extensions of (strong and weak) 
bisimulation [23], to come up with a lumped CTMC j\4 of 720 states. Here we 
model check the resulting CTMC using transient analysis. In short, the following 
atomic propositions are used: conn characterises states where both partners are 
connected to each other, and conversation is running, fed-up characterises states 
where either of the user is hooking the phone because he is apparently out of 
luck, unable to reach his conversation partner. Our basic time unit is 1 minute. 
The following properties are checked: 

conn), the probability of being connected within t minutes. 
conn), the probability of being connected after exactly t minutes. 
conn), the probability of being connected at some time bet- 
ween 100 and 100-|-t minutes. 

^ Pixip(“'fed_upW probability of a running conversation bet- 

ween t and 100 -I- t minutes, without failing to get connected beforehand. 

Note that only the first property can be checked with the current implemen- 
tation [22] of the non-symbolic model checking algorithm based on numerical 
integration [6]. We do not instantiate p, as the execution times and computed 
probabilities will be the same for all p g] 0, 1[. Statistics of the computation 
time needed to check these formulas globally, i.e., for all states, are depicted in 
Table 1. They have been obtained by means of a trial implementation of the 
uniformisation method written in C, running on a 300 MHz SUN Ultra 5 work- 
station with 256 MB memory under the Solaris 2.6 operating system. (In all 
cases reported below, the memory requirements are less than 20 MB.) 



t 


MV-mult. time 


MV-mult 


time 


MV-mult. time 


■Pxp(-'fed_upl7 

MV-mult. 


l*4UU+‘lconu) 

time 


0.1 


59 


7.75 


138 


41.91 


15,325 


3,549.70 


5,234 


351.75 


1 


102 


10.75 


267 


75.04 


15,368 


3,552.69 


5,349 


378.05 


10 


583 


38.40 


1,714 


416.38 


15,848 


3,580.27 


6,795 


747.21 


100 


5,081 


303.33 


15,265 


3,541.84 


20,347 


3,845.27 


20,347 


3,956.01 


1000 


8,901 


619.51 


39,155 


8,835.01 


24,167 


4,161.41 


151,815 


34,405.14 


10000 


8,901 


624.68 


39,155 


8,902.41 


24,167 


4,166.52 


151,815 


34,567.23 



Table 1. Computation time (in sec) and number of matrix- vector multiplications 
(times 10^) needed for checking CSL properties by means of uniformisation 



From these statistics, we draw the following conclusions. (1) We observe a roug- 
hly linear dependency between the time bound t and the run-time of uniformi- 
sation, due to the fact that the (precomputed) number of iterations needed by 
uniformisation is 0{t). (2) The number of iterations needed for t ^ 1000 is con- 
stant, due to the fact that our algorithm has a built-in steady-state detection. 
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In other words, the chain at time 1000 is already behaving close to equilibrium, 
up to a truncation error e (set to 10“® in all experiments). For time-bounds lar- 
ger than 1000, transient analysis could in fact be replaced by a (much cheaper) 
steady-state analysis. (3) The times needed to check P^p(0[*’*lcoim) are appro- 
ximately one order of magnitude higher than those needed for Vc>np{0'^^’*^conn). 
This is a consequence of the fact that the pruning of transitions in M [conn] (cf. 
Theorem 1) leads to a CTMC containing mutually unreachable parts, and for 
each state we perform transient analysis only on the reachable (lumped) CTMC. 
On average, this chain has only 62 states, explaining the order of magnitude dif- 
ference. Note that according to Lemma 2, transient analysis can take the original 
chain (with 720 states) unchanged to check P^p(0[*’*lconn), since conn — >• tt. 
The two rightmost formulas involve more than one (iterative) transient solution, 
on different lumped CTMCs (cf. Theorem 2) . Their time consumption is mainly 
determined by the size of the lower time bound (an observation that does not 
hold in general). 

6 Concluding Remarks 

The main result of this paper is that the verification problem for probabilistic 
timing properties, i.e., CSL-formulas of the form Vt^p{<d>\U ^ is reducible to 
a transient analysis of CTMCs. Thus, efficient techniques for transient analy- 
sis, such as uniformisation, can be adopted for model checking these formulas. 
In addition, we showed that a slight variant of (ordinary) lumpability on CT- 
MCs preserves all CSL-formulas. We illustrated these results by analysing a 
plain-old telephone system. Future work includes the adaption of partial unifor- 
misation [27] to our setting (thus allowing a partial search of the state space) and 
considering a symbolic variant of our presented approach using multi-terminal 
BDDs [4,12] in order to compare this approach with the symbolic (numerical 
integration) approach in [6]. The extension of our approach towards Markov 
reward models is reported in [7]. 
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Abstract. In this paper we present a semi-algorithm to do composi- 
tional model-checking for hybrid systems. We first define a modal logic 
L'I which is expressively complete for linear hybrid automata. We then 
show that it is possible to extend the result on compositional model- 
checking for parallel compositions of finite automata and networks of 
timed automata to linear hybrid automata. Finally we present some re- 
sults obtained with an extension of the tool CMC to handle a subclass 
of hybrid automata (the stopwatch automata). 



1 Introduction 

Model checking for timed and hybrid systems. Model-checking algorithms for 
finite-state automata have been extended to timed automata [ACD93] and tools 
like KRONOS [Yov97] or UPPAAL [LPY97] have been used successfully also 
for verifying many industrial applications [BGK+96,MY96]. Hybrid systems 
[ACH+95] are a strong extension of timed automata and model-checking (or 
reachability) is undecidable [HKPV98] for these models. Nevertheless semi-algo- 
rithms have been implemented in tools like HyTech [HHWT97]. 

Heuristics for model- checking. In the timed verification area, model-checking is 
decidable but its complexity is high (PSPACE-complete or EXPTIME-complete) 
[ACD93,AL99]. A lot of works deal with heuristics to overcome this complexity 
blow-up (symbolic approaches [HNSY94], efficient and compact data structu- 
res [BLP+99], on-the-fly algorithms [BTY97] etc). From a practical point of 
view it is interesting to have different methods to verify a system: an approach 
can be very efficient over some classes of systems while another one works well 
for other classes. This last point is crucial for hybrid systems as model-checking 
is undecidable: for instance the so-called forward and backward reachability ana- 
lysis algorithms [ACH+95,AHH96] are complementary. 

Compositional Model- Checking. An alternative method to standard model- 
checking is compositional model- checking [And95,LL95,LPY95]. Given a system 
S = {Hi \ • • ■ \Hn) and a property ip, the method consists in building a quotient 



E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 373—388, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 




374 



F. Cassez and F. Laroussinie 



formula (p/Hn s.t. {Hi \ ■ ■ ■ |iJ„) ^ iff {Hi \ ■ ■ ■ \= p/Hn- Some simpli- 

fication techniques can be applied in order to keep a small size for ip/Hn- By 
quotienting every component of the system one after the other, the remaining 
problem (if the last formula is not reduced to tt nor ff by simplifications) is to 
check a quotient property p' on the nil process which is an automaton that 
cannot do any action. Figure 1 describes the two steps of the method. 





— ^ 


Quotient 

-l- 


^1 


pGL’l 


— ^ 


Simplifications 


1 



nil-model checking 
or 




constraints solving 





Step 1 Step 2 

Fig. 1. Compositional Model Checking overview. 



Our contribution. We propose here a compositional algorithm for linear hy- 
brid automata. First we introduce a new modal logics (a kind of hybrid 
/x-calculus) which uses variables. allows us to express many kinds of pro- 
perties, in particular it can be used to do compositional model-checking. Then 
we present reduction strategies to simplify formulas. When Hi’s are hybrid 
automata, the two steps of the compositional method may not terminate (they 
require to compute fixed points over polyhedras), but it is possible to use ab- 
stractions for the first step in such a way that (1) termination is ensured and (2) 
S \= ip ^ nil \= (fi/S still holds. Then any problem {Hi \ . . . |i?„) \= p can be re- 
duced to a niFmodel-checking problem. Moreover we will see that this last step 
can be seen as a kind of constraints solving problem. Finally we present results 
obtained with a prototype HCMC which deals with hybrid automata where the 
slopes of variables belong to {0, 1}. 

2 Hybrid Automata 

Notations. Let V and V be two finite sets of variables with V C\V' = %. A 
valuation is a mapping from a set V of variables into R. For two valuations 
V G R^, v' G R^ , we define the valuation v.v' G R^^^ by (u.u')(a;) = v{x) 
\i X & V and {v.v'){x) = v'{x) if a; G V . If \V\ = n, a valuation v can be 
interpreted as a vector v of R". We also recall the following useful definitions: 

— A linear expression over V is of the form a+X)i with a, Ui G Z, Uj G V. 
The set of linear constraints C{V) over V is the set of formulas built using 
boolean connectives over expressions of the form e\ cxi C2 where e\ and 
are linear expressions and cxi belongs to {=,<,>,<,>}. Given a valuation 
V and a linear constraint 7, the boolean value 7(f) describes whether 7 is 
satisfied by v or not. 
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— A linear assignment over V is of the form v := A.v + b where A is a n x n 
matrix with coefficients in Z and & is a vector of Z". We denote an assignment 
a by a pair {A, b) and write v := a(v) for v := A.v + b. C{V) is the set of 
linear assignments. 

— A continuous change of variables is defined w.r.t. an element d of Z'^ (hereaf- 
ter referred to as the activity vector or direction) corresponding to the first 
derivative of each variable: given t G K>0) the valuation v + d.t is defined by 
{v + d.t){x) = v{x) + d{x).t. 

We will need a particular property on subsets of M" we refer to as d-strongly 
connection for some d G Z": A region r C K.” is d-strongly connected iff Vw, v' G r 
v' = V + d.t (for some t G R>o) implies WO < t' < t,v + d.t' G r. This means 
that if one can go from one point in r to another also in r following the direction 
d then all the intermediate points along the path are in r. If Future(r, d) (resp. 
Past(r, d)) denotes the future (resp. past) extension of r in direction d, then r 
is d-strongly connected is equivalent to r = Future(r, d) n Past(r, d) (then this 
condition can be effectively checked for). 

The model. Hybrid automata [ACH+95,AHH96,Hen96] are used to model sy- 
stems which combine discrete and continuous evolutions. 

Definition 1 (Hybrid automaton) A hybrid automaton H is a 7-tuple {N, 
Iq,V, A, E, Act, Inv) where: 

— N is a finite set of locations, 

— Iq G N is the initial location, 

— V is a finite set of real-valued variables, 

— A is a finite set of actions, 

— E C N X C{V) X A X L{V) x N is a finite set of edges; e = {1, 7, a, a, I') G E 
represents an edge from the location I to the location I' with the guard 7, the 
label a and the linear assignment a. 

— Act G (iX)^ assigns an activity vector for V to any location. Act{l){x) 
represents the first derivative of x in location 1. 

— Inv G C{V)^ assigns an invariant to any location. We require that for any 

I, Inv{l) is Act{l)- strongly connected. □ 



Example 1. As a, running example we will use the scheduler given in [AHH96] 
and depicted^ on Figures 2 and 3. This scheduler Sched (Figure 3) can handle 
two types of tasks: type Ti and type T 2 . Priority is given to tasks of type T2 and 
in case a task of type Ti is currently running it is preempted: then measuring 
the execution time of tasks of type Ti requires the use of a stopwatch yi . We also 
use a stopwatch y 2 to measure the execution time of tasks of type T 2 (although 
it is not necessary as tasks of type T 2 cannot be preempted.) Obviously, yi will 

^ The automata are designed using the GasTeX package available at 
http: / /www.liafa.jussieu.fr/'gastin/gastex. 
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Fig. 2. Automaton of the environment: Env 




Fig. 3. Automaton of the scheduler: Sched 



be running at rate 1 in location Ti and 0 otherwise. Tasks of type T\ take 4 time 
units and 8 time units for T 2 - The initial state of Sched is Idle. 

The number of pending and running tasks of type i is given by the integers^ 
ki (integers are implemented as stopwatch with slope 0 in every location). The 
arrivals (events Inti and Int 2 ) of the tasks are described by the (timed) auto- 
maton Env (Figure 2). This automaton specifies that the interval between two 
consecutive arrivals of tasks of type T\ (resp. T 2 ) is more than 10 (resp. 20) time 
units. □ 



Semantics of hybrid automata. The semantics of a hybrid automaton is given 
by an (infinite) transition system. At any time, the configuration of the system 
is a pair (l,v) where I is a location and v a valuation. The configuration can 



^ On the figures, stands for ki ~ ki + 1 and k^ for ki := ki — 1 . 
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change in two different ways: a discrete change can occur when a transition 
in E is enabled in the configuration {I, v), and a continuous change can occur 
according to the evolution law of the variables (given by Act{l)) and as long 
as the invariant Inv{l) remains true. The initial configuration of the hybrid 
automaton is (lo,vo) with vq = {0}^ (i.e. Uo(a;) = 0 Vx G V). 



Definition 2 (Semantics of a hybrid automaton) The semantics of a hy- 
brid automaton H = {N, Iq, V, A, E, Act, Inv) is a labeled transition system Sh = 
(Qj < 70 : “!') with Q = Nx qo = {Iq, vq) is the initial state (vq{x) = 0, \/x G V) 
and -G is defined by: 



(l,v) 

{l,v) 



£(t) 



1 = 1' v' = V Act{l).t and 
VO < f' < t, Inv{l){v Act{l).t') = tt 



-G {V , v') iff 



A run of a hybrid automata El is a path in Sh ■ 



□ 



Remark 1. Due to the Act(/)-connectedness, the condition 

VO < t' < t, Inv{l){v + Act{l).t') = tt 

is equivalent to Inv{l){v Act{l).f) = tt whenever Inv{l){v) = tt. 

Parallel composition of hybrid automata. It is convenient to describe a system as 
a parallel composition of hybrid automata. To this end, we use the classical com- 
position notion based on a synchronization function a la Arnold-Nivat [Arn94] . 
Let Hi, ... , Hn be n hybrid automata with Hi = {Ni, Vi, A, Ei, AcU, Invi). 
A synchronization function / is a partial function from (Au{*})" A where • 
is a special symbol used when an automaton is not involved in a step of the glo- 
bal system. Note that / is a synchronization function with renaming. We denote 
by {Hi \ . . . \Hn)f the parallel composition of the Hfs w.r.t. /. The configurati- 
ons of {Hi I . . . \Hn) f are pairs (I, v) with I = (Zi G A^i x ... x and 

V = vi ■ ■ ■ Vn with^ Vi G (we assume that all sets Vi of variables are disjoint.) 
Then the semantics of a synchronized product is also a transition system: the 
synchronized product can do a discrete transition if all the components agree to 
and time can progress in the synchronized product also if all the components 
agree to. This is formalized by the following definition: 

Definition 3 (Parallel composition of hybrid automata) Let Hi, H 2 , ■ ■ ■ , 

Hn be n hybrid automata with Hi = {Ni,lifi,Vi, A, Ei, Acti, InVi), and f a (par- 
tial) synchronization function {A U {•})" ^ A. The semantics of {Hi | . . . |iL„) / 
is a labeled transition system S = {Q, go, — >■) with Q = Ni x . . . x Nn x <7o 
is the initial state ((^i.o, • • ■ , In, o)jVq) and -G is defined by: 

® Vi is the restriction of u to VI. 
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— {I, v) — ^ {U, v') iff there exists (ai, . . . , a„) G (t1u{*})” s.t. f{ai, . . . , a„) = 
b and for any i we have: 

. If Qi = •, then I' = k and v[ = Vi, 

. If Ui G A, then {k, Vi) 

- (l,v) — > (l,v') iff'ii e we have {k,Vi) — {li^v'f) □ 

Example 2. The synchronization function / for the parallel composition (Sched \ 
Env) of the scheduler and the environment of example 1 is given by: 

f {Inti, Inti) = Inti and f{Endi,») = Endi 

for any i in { 1 , 2 }. 

3 A Modal Logic for Hybrid Automata 

3.1 Syntax of L’f, 

We use a fixed point logic to specify the properties of hybrid automata. This 
logic is an extension (with variables) of the logic L^, presented in [LL95] . It allows 
only maximal fixed points and consequently the specification of safety properties 
(this includes time-bounded liveness properties). 

Definition 4 (L(): for hybrid systems) Let K he a finite set of variables, 

Id a set of identifiers, and A an alphabet of actions. The set of formulas over 
K, A and Id is defined inductively by: 

::= (p \/ tp \ p A \ {a)p \ [a\p \ {5d)p \ [Sd]p \ Z \ a in p \ -f 

where a G A, a G E{K), 7 G C(K), d G Zff and Z G Id. □ 

3.2 Semantics of 

It is straightforward to define the tt and ff operators and moreover implication 
p ^ if whenever no identifier occurs in p (for example, c ^ p with c G C{K)) 
since this fragment is closed under negation. The meaning of identifiers is given 
by a declaration T> : \d ^ L{). 

Given a parallel composition S of hybrid automata, we interpret formula 
w.r.t. extended states (l,v,u) where (l,v) is a state of S and u is valuation for 
K variables (namely formula variables). Intuitively (a) (resp. [a]) denotes the 
existential (resp. universal) quantification over a-transitions, {6d) (resp. [5^]) de- 
notes the existential (resp. universal) quantification over continuous transitions 
of S w.r.t. the direction d for the variables of K. The a \v\ p formula means that 
after the change of formula variables according to a {a G C{K)), the new exten- 
ded state verifies p. The linear constraint 7 over K variables holds for (I, v, u) 
whenever ^{u) = tt. Finally Z holds for an an extended state, if it belongs to the 
largest solution of the equation Z = V{Z). For formula of the type x := Q \r\ p 
we simply write x in p. Moreover when no formula variable occur in a formula, 
we write {5) (resp. [5]) instead of {5nf) (resp. [< 50 ]). Formally the semantics of 
is given in Table 1. 
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(l_, v,u)\= 
{[, v,u)\= 
{[, v,u)\= 
(i, v,u)\= 

(i, v,u)\= 

(I, v,u)\= 

(I, v,u)\= 
{[, v,u) \= 
{l,v,u) 1= 



(f) /\(j) iff (I, v,u) \= (fi and (I, u, m) |= (j> 

(fiV (j) iff (I, v,u) \= (fi or (Z, V, u) \= <j) 

{a)(p iff 3 (Z , n , u) s.t. (I, v) — } and {I ,v ,u) \= ifi 

[a]ip iff V (Z ,v ,u), (I, v) — {I } implies {I ,v ,u) \= ip 

{Sd)p> iff 3 Z € R ° s.t. {I, v) — > (Z, V + Act(J).t) and 
(Z, V + Act(l).t, u 3- d.t) \= (fi 

[Sd]p> iffVieR °,(Z, n) — > {I, v + Act{l).t) implies 
{l,v + Act(i).t,u + d.t) \= (fi 
7 iff 7(m) = tt 

a in iff {l,v,a{u)) |= p 

Z iff (Z, V, u) belongs to the maximal solution of Z = 2?(Z) 
Table 1. Semantics of the modal logic 



3.3 Examples of Formulas 

As is a conservative extension of L^, we can derive classical temporal opera- 
tors as in the examples below: 

— To express that the action error is never performed, we can use the following 
equation: 



A = /\ [a]X A [errorjff A [(5]A 

a^A 

We denote this formula by ALWAYS_4([error]fF). In this example there is no 
formula variable and the [5] deals only with continuous transitions of the 
system S which is being specified. 

— To express that the number of a-transitions is less than 100 along any run, 
we can use the formula: a; := 0 in A with A defined by: 

X (x < 100) A /\ [b]X A [a] (a; := a; -hi in A) A 

b^A\{a} 

In this case, x is just a discrete formula variable. 

— More generally ALWAYS_ 4 ,_e(< 7 ’) with A C A and E C a finite set of 
directions for K variables, states that (p holds from any reachable state 
using actions transition in A and delay transitions with derivatives in E 
for K variables. ALWAYS^. can be defined with the following equation: 

A=Va /\[Se]X 

beA e£E 

the operator does not contain the evolution law of automata variables since they 
only depend on the system. 
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— In the same manner, the weak until operator (piUntil_ 4 ^£; is defined as the 
largest fixed point of: 

X =' (^2 V ((^1 A /\ [b]X A /\ [Se]X) 

beA e€E 

— As for timed automata, the following fixed point equation interpreted over 
a parallel composition of two hybrid automata (without invariant ^ Hi and 
i ?2 with the synchronization function /(a,*) = oi and /(•, a) = 02 for any 
a G A, expresses the (strong) bisimilarity of Hi and H 2 '- 

X /\ [ai](a2)A A /\ [ 02 ] (m) A A [5] A 

a^A a^A 

Example 3. We want to specify on our scheduling system that a task of type 
T 2 never waits: as a new arrival of a task T 2 will preempt a running Ti task, 
this amounts to check that /c 2 < 1 as ^2 gives the number of pending and 
running T 2 tasks; we would also like to prove that at most one task of type Ti 
may be pending i.e. ki < 2. We express the previous property as a reachability 
property of an error transition. Let Sched! be the hybrid automaton obtained 
from Sched by adding error-transitions {l,{ki > 2 V > 1), error, 0, /) for 
I G {Idle, Ti, T 2 }. It remains to check that error-transitions can not be fired i.e. 
{Env\Sched') f |= ALWAYS^([error]fF) that is: 

A = [Inti]X A [Int 2 ]X A [Endi]X A [End 2 ]X A [errorJffA [i5]A 

□ 



4 Compositional Verification of Hybrid Automata 

4.1 Quotient Construction 

Given a hybrid system {Hi | • • • | Hn) / and a formula ip, we want to build 
a formula p/^H^ s.t. (iJi | • • • | H„)f \= p iS {Hi \ ■ ■ ■ \ ^ p/^H„. The 

definition of the quotient construction is given in Table 2. Note that it is easier to 
define it w.r.t. a binary synchronization function between {Hi | | H„_i) and 

Hn but it is straightforward to decompose {Hi | • • • | / in such a manner. 

For the no-action label, the conventions are the following: {•)p = [•](/? = p. 
Note that the variables in Vn become formula variables in the quotient formula; 
this entails that the operators (6d) and [5^] occurring in p/^Hn deal with K\JVn- 
Moreover given d G and d' G Z^", d.d' denotes the corresponding integer 
activity vector over A U C„. Finally note that the quotienting of identifiers may 
increase ® the number of fixed point equations in T>'; in the worst case, the size of 
V is |H|.|A| where |A| is the number of locations of the quotiented automaton. 
This will motivate the use of reduction methods. Now we have: 

® The case of automata with invariant can be handled by a more complex formula 
based on the same idea. 

® A new identifier can be added to T> for any location I and any Zg Id. 
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{piAp2)/^l = {pi/jl) A {P2/^1) 


{pi V P2)/^l = 


{pi/^l) V {P2/^1) 


{{a)‘fi)/fl= V 

) En) / f(b,c) = 


7 A (a in Inv{l )) A {b){a in {p/^ 1 )) 

■-a 


(Wp)4^= a I 

) Efi) / /(6,c)=a 


C 

< 


1 ^ [b]{a in {p/^l )) 


{{Sf)p)/^l = {Sa. 


Act(i)}{Inv{l) A p/^l) 


{[&d]p)/^l = [5d.Acm]{Inv{l) => p/^l) 


N 

II 

N 


{a in p)/^ 1 — am 


II 


= 7 



Table 2. Quotienting rules to obtain ipj^l 



Theorem 1. Let {{Hi | ••• | Hn-i)f \ Hn) f be a system of n hybrid automata 
Hi = {Ni, lo^i, Vi, Ai,Ei, Acti, Invi) and Lp an L’f formula over K. If ((p, l),v.w) 
is a configuration of {{Hi | • • • | Hn-i ) / | Hn ) / with v G w G and 

u G s.t. Inv{l){w) = tt, then we have: 

{{p,l),v.w,u) \=Ti (p if and only if (p,v,w.u) \=xi '-p/l □ 

A sketch of the proof of Theorem 1 is given in appendix A. The following defi- 
nition shifts the meaning of the quotient to hybrid automata: 

Definition 5 (Quotient of p by a hybrid automaton) Let H = {N,lo,V, 
A,E, Act,Inv) and H' = {N' ,Iq,V' , A' , E' , Act' , Inv') be hybrid automata with 
Vo, v'o their initial valuations and f a synchronization function for H and H' . Let 
p be an L^ formula with clocks over K , T> a declaration and uq the valuation 
{0}^. Then {H \ H')f \=t, p iff {{Iq,Q,{vo.v'o),uo)) \=v T- We also define 
the quotient p/ H to be pj Iq. Consequently {H \ H')j \=xi p iff H' \=xi {p/ H) 
where V deals with the set of identifiers which are introduced during quotienting. 
□ 

Then verifying a system can be reduced to a nil-model checking problem 

Corollary 1. Let (• • • (_ffi |i? 2 )/i • • • |-ffn)/„ _i be an hybrid system, {Iq, vq) its 
initial configuration, p an L^ formula, V a declaration and uq the valuation 
{O}*”. If Inv{Io){vo) = tt, we have: {{lop,... ,lo,n),vo,uo) \=-d T ^ nil \=-d 

{t/ fn ' ' ' / foHl) n 



^ nil can only let time elapse without performing any action, this is an automaton 
with no variable and no edge. 
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4.2 Simplification Strategies 

The size of (p/H is in |) if we consider the formula as a dag (viz. repre- 

sented by a data structure with sharing of sub- formula) . Therefore the final quo- 
tient formula p/ Hn/ . . . /Hi may be in size exponential in \Hi \ |iJ„| -I- |(/3|. 

This blow-up of the quotient formula corresponds to the state explosion problem 
which occurs in classical model-checking approach (for example, complexity of 
model-checking for alternation free /r-calculus over product of classical automata 
is EXPTIME-complete [KVW 98 ] while it is only P-complete for one automaton). 
We are going to apply syntactical and semantical reductions over the quotient 
formula after each quotienting in order (to try) to keep in the quotient formula 
p/H only the part of the behavior of H which is relevant w.r.t. the property p 
we want to check. 

Many simplifications used for timed automata also apply here: Boolean Simpli- 
fications (ttAp = p, (a)fF = ff, etc), Trivial Equation Elimination (equations of 
the form X = [a\X A [( 5 d]X have = tt as solution etc.). Equivalence Reduction 
(if two identifiers X and Y are equivalent we may collapse them into a single 
identifier) . 

Hitzone Reduction consists in computing, for any atomic constraint ^ € C(K) 
occurring in p, an upper approximation of the set of valuations for K (i.e. a 
linear constraint over the formula variables) where the truth value of ^ is nee- 
ded to decide the truth value of p for the initial configuration. These sets are 
obtained by a (forward) fixed point computation. Afterwards, ^ can be replaced 
by tt (resp. ff) whenever ® 5”^ C| ^ ] (resp. | C 1= ®)- 

We illustrate the hitzone reduction with the following example. Consider the 
formula p = Xq with: 

Ao = (x2 < 3 ^ [a](:E2:=0 in Ai)) A (&)(xi :=0 in Aq) A [( 5 (i,i)]Ao 

Ai="(a:2<l ^ [&]A 2 )A[, 5 (o,i)](a: 2<4 ^ Ai) 

A2 = [c]A2 a (x 2 > a;i- 3 ) 

First we compute Sxo (resp. Sxi, Sxfi) corresponding to the set of K valuations 
where the truth value of Aq (resp. Ai, A2) is needed to decide whether p holds 
for the initial configuration. This is done by a forward fixed point computation: 
we start with =| Xi = X2 = 0 ] and =| ff ], 4 iterations are 

necessary to obtain the result: 

~ 'S'xo =Ia;i = X2 1, =[ = X2 = 0 1 and =[ff 1, 

~ =1 xi = X2 V a;i = 0 < X2 ], 

^'xi =|x2 = 0 <a:i< 3 Va;i = 0<a;2<4] and =| a;i = X2 = 0 ], 

“ 'S'io =[ 0 < a;i < a;2 ], =|0<xi<3A0<a;2<4] and =| X2 = 

0 < Xi < 0:3 V = 0 < X2 < 1 ], 

— =1 0 < < X2 ], =|0<a;i<3A0<a;2<4] and =| 0 < 

< 3 A 0 < a;2 < 1 ], 

* [ C ] denotes the set of valuations satisfying the linear constraint C 
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This computation of sets 5;^. is done by a forward propagation of linear con- 
straint, the sub-formulas (a in Lp), {[5d\p>) and (7 ip) are seen as predicate 
transformers. 

Figure 4 shows the final value of Sxg, Sxi and Sx^- Therefore the constraint 
(x 2 > xi— 3) in X 2 can be reduced to tt since Sx2 Q1^2 > Xi~3 ]. Then trivial 
equation elimination allows us to reduce X 2 (resp. X\) to tt. Finally we can 
simplify Xq to (6)(a:i:=0 in Xq) A 




Fig. 4. Example of hitzones computation. 



Of course, the computation of is based on operations over polyhedra and 
may not terminate (contrary to other simplifications). Nevertheless it is pos- 
sible to use coarser over-approximation of the S^’s to ensure termination, this 
clearly leads to less efficient simplifications (i.e. allowing less atomic constraints 
reductions) . 

Sharing variables. In the definitions of and Hi, we assume that the variables 
sets Vi and K are disjoint. This hypothesis is important only if hitzone simpli- 
fication is applied because this reduction assumes that transitions of automata 
which have not yet been quotiented do not modify the value of formula varia- 
bles (for ex. the sub-formula {x < 10) A {a){x > 10) is reduced to ff because 
it is assumed that x is not updated by performing the a-transition) . If we do 
not apply the hitzone simplification, automata variables can be used inside the 
formula and variables can even be shared between several automata. In these 
cases, the hitzone reduction can only be applied after the quotienting of the last 
automaton that uses the shared variables (i.e. when all the control part they are 
concerned with is present in the quotient formula). 

Nil -model- checking and constraints solving. Let ip' be ipjHnl ■ . ■ jH\. Note that 
g>' expresses a property over nil and then we can assume ® that no (a) and no 
[a] occur in ip' . In fact, deciding whether ip' holds for nil requires to compute 
the fixed point of equations Xi = 'D'(Xi) where V is the definition of identifiers 
occurring in ip' . But an extended configuration of nil is just an |iF'|-tuple of real 

For nil process, we have {a)ip = ff and [a\ip = tt. 
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numbers and then nil ^ tp' can be seen as a constraints problem over sets of 
|FiT'|-tuple of real numbers. For example, nil ^ Xq with the following declaration: 



Xo=' (X2 
= \X2 



1) ^ {x 2 ■= 0 in Xi) A (xi < 1 V xi > 1) A [i5(o,i)](a;2 < 1 ^ Xq) 
1) ^ {x 2 ■■= 0 in Xq) a [<5(i,i)](x2 < 1 ^ Xi) 



is equivalent to the problem of deciding whether (0, 0) G So where 5'o and Si 
(C K. X M) are defined as the maximal sets verifying: 

If (xi,X 2 ) G S'o Then (x 2 = 1 => (a:i,0) G S'!) and {xi < 1 V > 1) and 
(Vt ^ 0, X2 “t“ t ^ 1 (^1; ^2 “t“ t) G *S*q) 

If (xi,X 2 ) G S'! Then (x 2 = I => (a^i)O) G •S'o) and 

(Vt > 0, X2 + t < 1 {Xi +t,X2+t) G ^i) 

Therefore in the compositional verification, the first step deals with an high level 
description of the problem (a parallel composition of hybrid automata and a spe- 
cification written with L^) while the second step treats a more basic description 
and could be analyzed by a constraints solver over real numbers. 

In [ACH+95] reachability problems of the form “is it possible to reach a 
configuration verifying (po ?” (where (po is an atomic constraint) are reduced to 
a fixed point computation of an equation system which encodes the behavior of 
the hybrid system. In fact if we apply our quotient technique over the reachability 
formula ALWAYS(-'(^o)) we obtain the same kind of equations (it can be smaller 
thanks to the simplification step), and then the compositional method can be 
seen as an extension of the previous results over hybrid system since it deals 
with any kind of property which can be expressed with alternation-free modal 
/x-calculus. 



5 Hybrid CMC and Examples 

Hybrid CMC . We extended CMC (a tool implementing the compositional me- 
thod for timed automata [LL98]) in order to handle a subclass of hybrid systems 
where variables have slopes in {0,1}. Moreover we use classical constraints of 
TA (viz. xtxiTOorx — 2 /ixi to) and assignments are of the form x := y + ni or 
X := TO. The same restrictions apply to the modal logic handled by HCMC. This 
subclass of hybrid system remains very expressive (model checking is clearly un- 
decidable) and allows to model a large variety of systems [BF99a] . 

Given an hybrid system and a modal specification, HCMC allows us to build the 
simplified quotient formula. Moreover there is a procedure to (try to) solve nil 
model-checking problems. We use a DBM-like data structure to represent con- 
straints over variables in the algorithms [Dil89] . This choice motivates the restric- 
tion to slopes in (0, 1} (other slopes would require extended forms of constraints) 
but even in this framework some operations (like Future(d, z) = {u -I- d.t \ v G 
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z,t G R}) can only be approximated. These abstractions ensures termination 
of the two steps (simplifications and nil model-checking algorithm) of the com- 
positional approach. However the problem being undecidable, an answer not in 
{yes, no} may occur. This third result corresponds to cases where the abstrac- 
tions used for the second step are too large to be able to conclude. The prototype 
HCMC is available at the web address: http://www.lsv.ens-cachan.fr/~fl. 

Examples. The current version of HCMC has been applied over several examples. 
Here are the results: 

— The scheduler described in Example 1 has been successfully verified: the 
quotienting of the formula in Example 3 by the two components is directly 
reduced to tt by the simplifications. 

— We applied the method over a verification problem for two Petri Nets con- 
sidered in the MARS project and verified by HyTech (following the me- 
thod of [BF99b]). Here the problem consists in verifying that a place always 
contains less than 2 tokens. The verification succeeds in every case (either 
directly after the first step, or after the nil model-checking computation). 

— We have tried to verify the ABR protocol. We adapted the model used 
in [BF99a] for HyTech to HCMC. Here the first step gave a specification with 
25 equations but the second step couldn’t conclude due to approximations. 

6 Conclusion and Future Work 

In this paper we have extended the compositional model-checking method defi- 
ned in [LL95] for timed automata to hybrid automata. Our work is two fold: 

1. on the theoretical aspects we have proven that it is possible to do compo- 
sitional model-checking for hybrid automata; we have defined the logic L(), 
i.e. an extension of modal /r-calculus, so that this logic can express many 
properties for specification of reactive systems and is expressively complete 
for linear hybrid automata. 

2. we have implemented a prototype tool HCMC to handle verification of hybrid 
automata where variables have slopes in {0, 1}. 

As described in section 5, our method consists in two distinct steps: quoti- 
enting and then constraints solving. Our current implementation uses DBMs as 
a data structure to represent the regions of R.”. Obviously we are only able to 
manipulate over-approximation of the actual regions of our hybrid automata. 

If this is no harm for step I as we only miss some simplifications, it is a real 
impediment for step 2: if the approximation is too coarse we are not able to 
prove some properties. 

Our future work will consist in extending HCMC in order to deal with integer 
slopes without approximations (this could be done by using a tool like HyTech 

http : //www. loria. fr/'xie/Mars .html 
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and a polyhedra library to do the final nzZ-model-checking and simplifications 
or a constraints solver over the reals). We can also easily extend our logic and 
algorithm to cope with slopes within integers intervals i.e. x € [l,u] so that we 
will be able to use HCMC on real-life examples and compare the performances 
with other related tools. 
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A Proof of Theorem 1 

For simplicity we restrict the proof to the case of two hybrid automata Hi and 
i ?2 and a synchronization function /. The proof is carried out by induction on 
the formula of For all inductive definitions except [6d]p and its dual {Sd)<p 
the induction steps are exactly the same as for timed automata (see [LL95]). 
We here focus on the case of the rule [5d\T (the case {5d)<p works in the same 
manner) involving continuous evolution as this case involves the main difference 
between hybrid and timed automata: the rates of the variables are values of Z 
and not always 1. 
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Proof (Rule [Sd](fi-)- Let {{h,l2),vi-V2) be a configuration of {Hi\H2)f. Given a 
t> 0 ,v[ (resp. v'2) will denote vi + Act{lx).t (resp. V2 + Act{l2)-t) and given an 
activity vector d, u' will denote u + d.t. 

Assume {{h^l2),Vi-V2,u) \=-d [dd]p and Inv{l2){v2) = tt. 

Then by definition of semantics (Table 1 ), we have: 

Vt > 0 , {{h,l2),Vi.V2) - > {{h,h),v[.v!2) implies {{li,l2),v[.V2,u') \=v p 

We have to show: {li,vi,V2-u) |=d [Sd.Act(h)]{Inv{l2) ^ p/^h)- 
Let t > 0 s.t. there are two cases: 

— Inv{l2){v2) = ff: therefore {Inv{l2) ^ p/^h) holds for {li,v[,V2-u'), 

— Inv{l2){v2) = tt: therefore there exists {{h,l2),Vi.V2) — > ((^i, ^2)) 
because Inv{l2){v2) = Inv{l2){v2) = tt and Inv{l2) is Ac<(^2)-strongly con- 
nected. Moreover {{li,l2),v'i.V2,u') \=xi entails that {‘p/ h) holds for 
{li,v[,V2-u') because we have the induction hypothesis: 

{{ll,l2),v[.V2,u') \=x, (p ^ {li,v[,V2-u') \=x: P / I2 

Then we have: {l\,v(,V2-u') \=xi {Inv{l2) pll2)- 
Therefore we have: {li,vi,V2-u) \=x: [Sd.Act{h)Kl'>T-v{l2) ^ p/^k)- 

<^=. Assume (li,vi,V2.u) \=xi [Sd.Act(i2)Kl'nv{l2) ^ p/^k)- 
We want to show {{h,k),vi-V2,u) |=x> [ 5 d\'P- 

Let t > 0 s.t. {{h,k), V1.V2) — {{li,k),v[-V2)- This entails that there exists 
{h,vi) — (li,v[) and then {li,v[,V2.u') \=xi {Inv{k) => p/ k)- Moreover 
we know Inv{k){v'2) = Inv{k){v2) = tt and then we have {li,v[,V2-u') |=x> 
{ip/^k)- Therefore by i.h. we obtain: {{li,k),Vi.V2i'^') \=t> P- 
Then we have: {{li,k),vi.V2,u) |=d [ 5 d]p- 
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Abstract. Our experience with semi-exhaustive verification shows a severe 
degradation in usability for the corner-case bugs, where the tuning effort 
becomes much higher and recovery from dead-ends is more and more difficult. 
Moreover, when there are no bugs at all, shifting semi-exhaustive traversal to 
exhaustive traversal is very expensive, if not impossible. This makes the output 
of semi-exhaustive verification on non-buggy designs very ambiguous. 
Furthermore, since after the design fixes each falsification task needs to 
converge to full verification, there is a strong need for an algorithm that can 
handle efficiently both verification and falsification. We address these 
shortcomings with an enhanced reachability algorithm that is more robust in 
detecting corner-case bugs and that can potentially converge to exhaustive 
reachability. Our approach is similar to that of Cabodi et al. in partitioning the 
frontiers during the traversal, but differs in two respects. First, our partitioning 
algorithm trades quality for time resulting in a significantly faster traversal. 
Second, the subfrontiers are processed according to some priority function 
resulting in a mixed BFS/DFS traversal. It is this last feature that makes our 
algorithm suitable for both falsification and verification. 



1 Introduction 

Functional RTL validation is addressed today by two complementary technologies. 
The more traditional one, simulation, has high capacity but covers only a tiny fraction 
out of all the possible behaviors of the design. On the contrary, formal verification 
guarantees full coverage of the entire state space but is severely limited in terms of 
capacity. A number of hybrid approaches that combine the strengths of the two 
validation technologies have emerged lately. One of them is semi-exhaustive 
verification [1,3, 4, 5, 9] that aims at exploring a more significant fraction of the state 
space within the same memory and time limits. While maintaining high coverage, 
similar to exhaustive verification, this technique is reaching more buggy states, in less 
time and with smaller memory consumption. In this family we can include several 
heuristics, like the high-density reachability [1], and saturated simulation [4,5]. 



* Work partially supported by NSF grant CCR-9700061 and a grant from Intel Corporation. 

E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 389-402, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




390 R. Fraer et al. 



We have run extensive experiments using the high-density reachability on a rich 
set of industrial designs, and we have reported encouraging results [15]. As opposed 
to previous studies, where semi-exhaustive algorithms are evaluated on the basis of 
their state space coverage, our results show these algorithms to be highly effective in 
bug finding too. We have identified two classes of problems where semi-exhaustive 
verification is particularly beneficial: 

- Dense bugs: Designs characterized by a high density of buggy states. The 
exhaustive verification usually finds these bugs too, but the semi-exhaustive one 
reaches the buggy states much faster. 

- Corner-case bugs: Designs characterized by sparsely distributed bugs in the state 
space such that the exhaustive verification blows up long before reaching them. 
These bugs are now within the reach of semi-exhaustive algorithms that can go 
deeper in the state space, while keeping the memory consumption under control. 

Nevertheless, the corner-case bugs require a much higher tuning effort. Experience 
shows that often only a specific subsetting heuristic and a particular threshold are 
leading the semi-exhaustive algorithm to a buggy state, and very small variations of 
this “golden” setting are bound to fail. Failure here means getting stuck in a dead-end, 
where no more new states are found, and yet we are not sure if all the reachable states 
have been covered. The dead-end recovery algorithm of [11] addresses this problem 
by using the scrap states (the non-dense subsets, ignored by previous subsettings) to 
regenerate the traversal. The scrap states are partitioned as well, similarly to [11]. 
However, we had little success with this algorithm, as the BDDs of the scrap 
partitions gets larger and larger and more difficult to use. 

For the same reason, shifting from semi-exhaustive traversal to an exhaustive one 
is very expensive, and often impossible. This makes the semi-exhaustive approach 
unsuitable for verification tasks (where there are no bugs at all). The ability to address 
both verification and falsification problems is important in an industrial context, since 
after the design fixes each falsification task needs to converge to full verification. 
Moreover, a significant problem in industrial-size projects is to ensure that the process 
of fixing one design problem does not introduce another. In the context of 
conventional testing this is checked through regression testing [19]. If consecutive test 
suites check N properties, a failure in one property may require re-testing all the 
previous suites, once a fix has been made. Efficient regression testing, clearly, 
requires an algorithm powerful both for verification and falsification. 

To overcome the above drawbacks in the semi-exhaustive approach, we introduce 
in this paper an enhanced reachability analysis that is efficient both for falsification 
and verification and that is less sensitive to tuning. For falsification, we still want to 
use the mixed BFS/DFS strategy that is at the core of the semi-exhaustive approach. 
As for verification, we replace the dense/non-dense partitioning of [1] with a balanced 
partitioning as in Cabodi et al. [13]. As noted above, the density criterion is unsuitable 
for verification, due to the difficulty of exploring the non-dense part of the state 
space. 

So our approach follows the lines of Cabodi et al. [13], producing at each step a set 
of balanced partitions instead of one dense partition, but differs in two respects. First, 
our partitioning algorithm trades quality for time resulting in a significantly faster 
traversal. The partitioning is based as in [13] on selecting a splitting variable. We 
have witnessed that the selection of the splitting variable can be very expensive 
computationally and significantly increase the overall traversal time. Therefore, we 
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make use of a new heuristics to select only a subset of the variables as candidates for 
the splitting as opposed to [13] where all the variables are involved in the selection 
process. Although the new partitioning algorithm may generate less balanced 
partitions, it is faster than the original and reduces the overall traversal time. 

Second, while [13] does a strict breadth-first search as in the classic reachability, 
we make use of a mixed breadth-first /depth-first traversal controlled by a prioritized 
queue of partitions. We check the correctness of the invariant properties on-the-fly 
during the reachability analysis. The mixed approach, in addition to enjoying the 
benefits reported by Cabodi et al. for balanced decomposition (i.e., low peak memory 
requirement and drastic CPU-time and capacity improvement), makes the search more 
robust in case of falsification, by getting faster to the bugs. On-the-fly verification 
[16] clearly reduces the time required to get to the bugs. Experiments on Intel designs 
show a marked improvement on the existing exact and partial reachability 
techniques. 

The paper starts with a summary of related work in Section 2. Section 3 introduces 
the new prioritized traversal as well as the fast splitting algorithm. In Section 4 we 
report experimental results comparing prioritized traversal to existing traversal 
algorithms and the new splitting algorithm to the original one. We conclude in 
Section 5 by summarizing the contributions of this work. 



2 Related Work 

Semi-exhaustive verification addresses the concerns of practicing verifiers by 
shifting the focus from verification to falsification. Rather than ensuring the absence 
of bugs, it turns the verification tool into an effective bug hunter. This hybrid 
approach aims at improving over both simulation and formal verification in terms of 
state space coverage and capacity respectively. 

Our usage of semi-exhaustive verification follows the lines of [1,3, 5, 9], being 
based on subsetting the frontiers during state space exploration, whenever these 
frontiers reach a given threshold by making use of various under-approximation 
techniques. 

The effectiveness of the semi-exhaustive verification is clearly very sensitive to the 
nature of the algorithm employed for subsetting the frontiers. A large number of BDD 
subsetting algorithms have been proposed lately in the model checking literature. 
Each of them is necessarily a heuristic, attempting to optimize different criteria of the 
chosen subset. An important class of heuristics takes the density of the BDDs as the 
criterion to be optimized, where density is defined as the ratio of states represented by 
the BDD to the size of the BDD. This relates to the observation that large BDDs are 
needed for representing sparse sets of states (as it is the case for the frontiers). 
Removing the isolated states can lead to significant reductions in the size of the 
BDDs. 

Ravi and Somenzi [1] have introduced the first algorithms for extracting dense 
BDD subsets, Heavy-Branch (HB) and Short-Path (SP). Independently, Shiple 
proposed in his thesis [8] the algorithm Under-Approx (UA) that also optimizes the 
subset according to the density criterion. Recently, Ravi et al. [9] proposed Remap- 
Under-Approx (RUA) as a combination of UA with more traditional BDD 
minimization algorithms like Constrain and Restrict [10]. A combined algorithm is 
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Compress (COM) [9], which applies first SP with the given threshold and then RUA 
with a threshold of 0 to increase the density of the result. Although more expensive, 
the combination of the two algorithms is supposed to produce better results. 

The Saturation (SAT) algorithm [3,5] is based on a different idea. Rather than 
keeping as many states as possible, it attempts to preserve the interesting states. In the 
context of [3,5] the control states are defined as the interesting ones. The heuristic 
makes sure to saturate the subset with respect to the control states, i.e. that each 
possible assignment to the control variables is represented exactly once in the subset. 
In terms of BDDs, this is implemented by Lin and Newton’s Cproject operator [12]. 

Previous studies [1,3,5, 9] advocate the merits of dense frontier subsetting 
techniques on the basis of their coverage of the reachable state space and the density 
of the approximated frontiers. We have evaluated in [15] the effectiveness of these 
techniques for bug hunting and confirmed their usefulness in the fast detection of 
design or specification errors. However, a major shortcoming of these techniques is 
the difficulty, and in many cases the inability, to provide formal guarantees of 
correctness, in case no bug is found. In other words, these techniques, although useful 
for falsification, are not practical for verification. Furthermore, these techniques suffer 
from high tuning effort in case they are used to find the corner-case bugs. 

Our approach is closely related to the work of Cabodi et al. [13], which considers 
as a key goal the good decomposition of state sets. They adopt a technique aimed at 
producing balanced partitions with a possibly minimum overall BDD size. This size is 
often slightly larger than the original one, but this drawback is largely overcome by 
the benefits derived from partitioned storage and computations. We have observed 
that the balanced partitioning approach is very effective in reducing peak memory 
requirements and it can drastically decrease the overall BDD size. From our 
experience, the benefits of balanced partitioning over dense subsetting is that it 
requires less tuning effort and it can much more easily and efficiently converge to 
exact reachability. On the other hand, the BDD splitting algorithm used in [13] may 
be computationally very expensive. For average designs the time spent in BDD 
splitting may outweigh the overall benefits of balanced partitioning. Cabodi et al. 
make use of a partitioned breadth-first search. For falsification we have observed the 
mixed breadth-first/depth-first approach to be much more effective. The enhanced 
reachability analysis that we propose enjoys the benefits of balanced partitioning and 
addresses its shortcomings. 

Finally, the work of Narayan et al. [14] on partitioned ROBDDs takes a more 
radical approach where all the BDDs in the systems (not just the frontier and the 
reachable states) are subject to partitioning. Moreover, different partitions can be 
reordered with different variable orders. This approach can cope better with the BDD 
explosion problem, but it involves significant effort in maintaining the coherency of 
the infrastructure where several BDD variable orders coexist simultaneously. 



3 Improved Reachability Search 

A finite state machine is an abstract model describing the behavior of a sequential 
circuit. A completely specified FSM M is a 5-tuple M = (1,S,6,A,, S^,), where I is the 
input alphabet, S is the state space, 5 is the transition relation contained in S x I x S, 
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and c S is the initial state set. BDDs are used to represent and manipulate 
functions and state sets, by means of their characteristic functions. In the rest of 
paper, we make no distinction between BDDs and set of states. 



3.1 Invariant Verification 

A common verification problem for hardware designs is to determine if every state 
reachable from a designated set of initial states lies within a specified set of “good 
states” (referred to as the invariant). This problem is variously known as invariant 
verification, or assertion checking. Invariant verification can be performed by 
computing all states reachable from the initial states and checking that they all lie in 
the invariant. This reduces the invariant verification problem to the one of traversing 
the state transition graph of the design, where the successors of a state are computed 
according to the transition relation of the model. Moreover, traversing the state graph 
in a breadth-first order makes possible to work on sets of states that are symbolically 
represented as BDDs [6]. This is an instance of the general technique of symbolic 
model checking [7]. 

Given an invariant inv and an initial set of states S„, reachability analysis starts 
with the BDD for S^, and uses BDD functions to iterate up to a fixed point, which is 
the set of all the states reachable from using the Img operator. If the set of 
reachable states R is contained in inv, then the invariant is verified to be true, 
otherwise the invariant is not satisfied by the model and a counter-example needs to 
be generated. 

The primary limitation of this classic traversal algorithm is that the BDDs 
encountered at each iteration F, commonly referred as frontiers, and R, referred as 
reachable states, can grow very large leading to a blow-up in memory or to a 
verification time-out. Moreover, it may be impossible to perform image computation 
because of the BDDs involved in the intermediate computations. 

Subsetting traversal introduced by Ravi and Somenzi [1] and Partitioned traversal 
proposed by Cabodi et al. [13] (Figure 1) tackle these shortcomings of the classic 
traversal by decomposing state sets when they become too large to be represented as a 
monolithic BDD or when image computation is too expensive. 

Subsetting traversal keeps the size of the frontiers under control by computing the 
image computation only on a dense subset of the frontier, each time the size of the 
current frontier reaches a given threshold. When no new states are produced, the sub- 
traversal may have reached the actual fixed-point (the one that would be obtained 
during pure BFS traversal) or a dead-end, which arises from having discarded some 
states during the process of subsetting. Theoretically, termination could be checked by 
computing the image of current reached set of states (as in Figure 1). In practice, this 
is rarely feasible, given the size of the BDD representing the reachable states. A dead- 
end resolution algorithm was proposed in [11] that keeps the scrap states (ignored by 
previous subsettings) in a partitioned form to regenerate the traversal. However, we 
had little success with this algorithm, as the BDDs of the scrap partitions (the non- 
dense part of the state space) get larger and larger and more difficult to use. 
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SUBSETTING_TRAVERSAL (4 5 th) 


PARTITIONED_TRAVERSAL (45 th) 


R=F=S-, 

o’ 


R = F = S-, 

p p 


while (F ( 


while (A ^ 0) { 


F = 1MG(4F)-R; 


7;=(IMG(4y)|/eF); 


R = RuF- 


F =SET DIFF(T,R); 

p — V pi p/i 


if {F = 0) // check if dead-end 


R = SET UNION (F,R); 

p — \ pi p'’’ 


F=lmg{dR)-R; 


F = RE_PARTITION(F , th); 
r" = RE PARTITION(R'fA); 


if (size(F) > th) 

F = subset (F); 

1 


} 



Figure 1. Subsetting traversal [1] and Partitioned Traversal [13] 

Partitioned traversal is a BFS one, just like the classic algorithm, except that it 
keeps the frontier F as a set of partitions F^, and the reachable states R as a set of 
partitions R^. The T^ sets are the results of image computation for each of the 
subfrontiers F , which may be either re-comhined or partitioned again. This is done hy 
functions like SET_DIFF, SET_UNION, and RE_PARTITION that work on 
partitioned sets. The intuition behind this approach is to overcome the complexity 
issue of large frontiers hy splitting the frontiers to balanced partitions with minimum 
overall size and decrease peak BDD size by performing the image computation on the 
partitions of the frontier. One advantage of partitioned traversal over subsetting 
traversal is that it can more easily converge to exhaustive reachability. During the 
whole computation, all the partitions are preserved and image computation is 
performed on all the partitions. The computation never gets into dead-ends and 
consequently there is no need to perform image computation on the reached states. On 
the other hand, a major drawback of this approach is the time spent in selecting the 
BDD variable to split the frontier to balanced partitions. Moreover, if this traversal is 
used for falsification purposes, since the reachable states are computed still in a BFS 
fashion, it is time consuming to get to deep bugs. 

Our approach is related to both [1] and [13]. As [13] it aims at generating balanced 
partitions instead of dense subsets, since we have experienced that balanced 
partitioning requires less tuning effort than dense subsetting and makes it easy to 
converge to full verification. Similarly to [1], we make use of a mixed breadth- 
first/depth-first search strategy in order to reach the deep bugs faster. Moreover, we 
propose a fast splitting algorithm that reduces the overall traversal time. 

Consequently, our approach enjoys the benefits of both subsetting and partitioned 
traversal, while addressing the shortcomings of the two approaches: unsuitability of 
subsetting traversal for full verification, inefficiency of partitioned traversal for 
falsification and performance penalty induced by the splitting algorithm in partitioned 
traversal. 



3.2 Fast Frontier Splitting 

Cabodi et al. [13] proposed a BDD splitting algorithm based on single variable 
selection, that aims at producing balanced partitions with minimum overall BDD size. 
The algorithm is based on a procedure estimating the size of the cofactors with 
respect to a given variable. The cost of splitting a BDD with variable v is computed 
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making use of the estimated node counts. The cost function calculates the potential of 
a variable v to generate balanced BDDs with minimum overall BDD size. The main 
drawback of this approach is that the estimated node counts may be quite inaccurate, 
resulting in unbalanced partitions. The estimations are computed without considering 
the reductions and sub-tree sharing. Therefore, in [18] an enhanced procedure was 
proposed. The enhanced algorithm takes into consideration the sharing factor while 
estimating the size of the cofactors with respect to a given variable. This refinement 
yields very precise estimates but is much slower than the original one. 

Moreover, both algorithms estimate the size of the BDD representing /constrained 
either by v or ~v for each variable v in the true support of /. The cofactor size 
estimation for each variable in the support is computationally very expensive, and 
therefore, is a major drawback of the partitioned reachability analysis. 

Our frontier splitting algorithm aims at achieving a good time/accuracy trade-off. 
We still want to use the accurate cofactor estimation of [18], but only on a subset of 
the variables. Therefore, we propose a two-stage algorithm^. The first stage prioritizes 
the variables according to their splitting quality. The second stage calculates accurate 
cofactor size estimations as in [18] for the subset of variables chosen to be the best 
candidates at the first stage. The performance improvement is mainly due to the 
accurate cofactor size estimation of only a subset of the variables. Additionally, the 
first stage of the algorithm, which prioritizes the variables with respect to their 
splitting quality is quite fast. This first stage is comparable to the function counting 
the nodes of a BDD, so its time complexity is linear in the size of the BDD. 
Therefore, we get a significant performance gain that outweighs the degradation in 
splitting quality, as testified by the results in Section 4. 

In order to prioritize the variables with respect to their splitting quality, we 
estimate the size of every sub-function / in a BDD (i.e., the size of the sub-DAG 
under every BDD node), which we refer to as c\f\. Clearly, c\f\ < c\fo\ + c[/] -H 1, 
and the equality holds only when there is no sharing between /o and/y. Using a DFS 
traversal of the BDD representing the function /, the exact c[/] may be calculated as 
c\f\ = c’lfo] + c’lfi] + 1, where c’\fo\ and c’[fy] are the number of unvisited nodes 
encountered during the traversal of/o and/y, respectively. If/o is traversed first, then 
c’lfo] = c\fo\ and c’[fy] <c[/], since there may be node sharing between /o and/;. 
Similarly, c’[/] = c[/] and c’\fo\ < c\fo\ if/; is traversed first. Therefore, the value of 
c’lfo] is different if the traversal starts from Fq or F;. The maximum of the two values 
gives an accurate estimate for c[/]). The same holds for c[/]. 

Based on these facts, we make two traversals on the BDD of F, where at the first 
traversal we expand the 0-edge of every node /of F, while at the second pass the 1- 
edge is expanded first. At each pass, clf] is updated such that its final value is the 
maximum value resulting from the two traversals. Note that the resulting clf] is only 
an estimate of the real size of /. To measure the splitting quality of each variable v in 
the support of F, we make use of the estimate sizes of sub-functions in F. We 



+ We would like to thank one of the reviewers for pointing us to a more recent paper of Cabodi 
et al. [17] proposing similar improvements to the splitting algorithm. However, their 
estimation of cofactor sizes is based on computing different metrics during the DFS traversal 
of the BDD. 
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experimented with several cost functions and we have found the following one to be 
satisfying: 

y abs(c[fi]-c[fi]) y c[fi] + c[fi] +1 

COST(v) = ^ ^ c[fv]) c[f] 

card({/|var(/) = v}) 

where var(/) is the root variable of the sub-BDD /, so the summations in the 
nominator are done over all the BDD nodes / that are at the level of the variable v. 
The denominator is simply the number of all the nodes at the level of v. The first term 
in the nominator represents the balance between the cofactors, and the second term is 
the node sharing. The weight w of the balancing term was empirically determined to 
be 0.4. 

We then proceed to the second stage, where variables are prioritized in increasing 
order of their COST(v), and an accurate estimation of cofactors’ sizes is calculated on 
the best N variables using the size estimation algorithm in [18]. The variable with the 
minimum larger cofactor, that is min(max(|/[|,[f_^|)), is chosen as the splitting variable 
V. We experienced that calculating an accurate estimate to a small set of variables (for 
instance, N = 15) is sufficient to get to a nearly optimal splitting. 



3.3 On-the-Fly Verification 

We have incorporated on-the-fly verification of the invariants [16] to our reachability 
algorithms. Instead of checking the invariant only after all reachable states are 
computed, this check is performed after each computation of a new frontier. This 
feature is critical for falsification problems, where the different algorithms are 
evaluated with respect to their success in bug finding. This should be contrasted with 
previous work ([1,11,13]) where the evaluation is strictly based on the number of 
reached states. 



3.4 Prioritized Traversal 

Figure 2 shows the pseudo-code for the prioritized traversal. The set of states that 
satisfy the invariant is represented by inv. Prioritized traversal can be performed with 
a transition relation 8 in monolithic or partitioned form. The results in Section 4 are 
obtained when a partitioned transition relation is used. 

R represents the set of states reached so far, initially equal to the set of initial states 
S„. Fqueue is the prioritized queue of the frontiers that have yet to be processed. 
Initially Fqueue contains just the set S^. As long as the queue is not empty, we pop its 
first element F, compute its image and reinsert the result into Fqueue according to the 
priority function. Whenever a new frontier F is inserted into the queue, if its size 
exceeds the threshold th, F is completely partitioned (i.e., decomposed until the size 
of all its sub-partitions are below th) making use of the splitting algorithm explained 
in Section 3.2. The new partitions are inserted in the queue in the order imposed by 
the priority function. We have experimented with several priority functions 
(minimum_size, density, frontier _age). Our experience, as can be observed from the 
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results reported in Section 4, shows that minimum_size is the most efficient and the 
least sensitive priority function. At each fixed-point iteration step, the correctness of 
the invariant is checked. The traversal ends if the invariant is falsified during the 
reachability analysis. 



PRIORITIZED_TRAVERSAL 


INSERT(F, Fqueue, th, prio^unc) 


{&S th, prioj'unc) 


if (size(F) > th) ( 


R =S;, 


= COMPLETE_PARTITION(F th); 


INVARIANT_CHECK(5o, inv)\ 


Foreach/s F^ 


INSERT(5o, F queue, th, prio^unc); 


insert/in Fqueue using prioj^unc; 


While {F queue ^ []) [ 


} 


F = POP (F queue); 


else if (F ^ 0) 


F = lMG{d, F) - R; 


insert/in Fqueue using prioj^unc; 


R = RuF; 

INVARIANT_CHECK(E, inv); 


INVARIANT_CHECK(E, inv) 


INSERT(/s F queue, th, prioj^unc); 

} 


if (F <X inv) report the bug and exit; 



Figure 2. Prioritized traversal 



Prioritized traversal is a mixed BFS/DFS traversal, which can be converged to full 
DFS or BFS by inserting the new frontiers always to the top or the bottom of the 
queue. Therefore, as a search traversal, it subsumes the partitioned traversal 
introduced by Cabodi et al. [13]. Note that our approach, unlike [13], does not re- 
partition the reachable states and the frontiers at each step. We estimate the re- 
partitioning to be very expensive, and we avoid using a partitioned SET_DIFF (Figure 
1 ) to detect the fixpoint. 



4 Experimental Results 

The results reported in this section come to support our main claims about the 
advantages of the new algorithms proposed in this paper. Section 4.1 compares our 
fast splitting algorithm against the one in [18] by measuring the impact of the 
time/quality tradeoff on the overall performance. Section 4.2 compares prioritized 
traversal with classic, subsetting and partitioned traversal on a set of verification and 
falsification problems. While these problems can be handled by most of the 
algorithms, our data shows that prioritized traversal behaves consistently well all over 
the spectrum. It is the only algorithm that is robust for both verification and 
falsification. Finally, section 4.3 reports successful results of prioritized traversal on a 
number of challenging testcases (both verification and falsification) that no other 
algorithm can handle. This emphasizes the capacity improvement of our new 
algorithm. 
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4.1 Comparing Splitting Algorithms 

For the purpose of this experiment we have run prioritized traversal using two 
different BDD splitting algorithms: the one implemented in [18] denoted by SPLIT 
below, and our fast splitting algorithm denoted by SPLIT+. Running the two 
algorithms on four full verification tasks (requiring exact reachability) allowed us to 
compare the impact of the splitting algorithms on the overall traversal time. Table 1 
describes the number or variables of each circuit (latches and inputs), the threshold 
that triggers frontier partitioning, as well as the time (seconds) and memory (Mb) for 
the complete traversal. 



ckt 


cktl 


ckt2 


ckt3 


ckt4 


vars 


81 lat/101 inp 


79 lat/80 inp 


136 lat /73 inp 


129 lat /82 inp 


thresh 


50000 nodes 


100000 nodes 


100000 nodes 


100000 nodes 


Trials 


time 


mem 


time 


mem 


time 


mem 


time 


mem 


SPLIT 


327 


169 


368 


91 


3880 


219 


3522 


190 


SPLIT+ 


244 


169 


309 


98 


2840 


219 


2359 


185 



Table 1. Impact of splitting algorithms on overall traversal time 



While the speed of the splitting algorithm obviously affects the traversal time, so 
does the accuracy of the splitting. Indeed, a poor split requires additional splitting 
steps and increases the number of iterations of the traversal. In this respect, the speed 
of SPLIT+ compensates its occasional loss in accuracy, causing it to perform 
systematically better than SPLIT. 

Even more convincing is to compare the two algorithms on a few specific 
partitioning problems. In Table 2 we pick three representative cases and report the 
sizes of the initial BDD and of the two partitions resulting from the split, as well as 
the time required by the split. SPLIT+ is consistently 8-lOX faster than SPLIT, 
without a significant loss in accuracy, although occasionally it can produce a poor 
decomposition (as seen in the first line of Table 2). 





1 SPLIT 1 


1 SPLIT+ 1 


Initial BDD 


Partitions (BDD nodes) 


Time 


Partitions (BDD nodes) 


Time 


331520 


224731 /215579 


98.83 


63834/268414 


8.86 


100096 


52527 / 52435 


31.13 


54293 / 47042 


4.86 


105003 


53321 /53312 


27.96 


53321 /53312 


2.81 



Table 2. Comparing splitting algorithm on specific partitioning problems 



4.2 Robustness for Both Verification and Falsification 

In this section, we selected for the purpose of evaluation several real-life verification 
and falsification problems that can be handled by all the algorithms: classic 
reachability (CLS), subsetting traversal using the Shortest Paths heuristic (SUB), 
partitioned traversal (PART) and prioritized traversal (PRIO). The point that we want 
to make here is that PRIO behaves consistently well all over the spectrum. It is the 
only algorithm that is efficient and robust for both verification and falsification. As 
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opposed to [13], our implementation of PART does not re-partition the frontiers and 
reachable states. 

As for PRIO, the priority queue is sorted by BDD size (the smallest frontiers are 
traversed first). This setting was uniformly a good choice, but different priority 
functions perform better on different examples, which justifies the need for a general 
priority queue mechanism. Also, the results obtained with the different priority 
functions, and for each of them several thresholds have been consistently good, which 
supports our claim about the robustness of this approach. The results for other priority 
functions are not included here, due to space constraints. 
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cktl 


ckt2 


ckt3 


ckt4 


vars 


81 lat/101 inp 


79 lat/80 inp 


136 lat /73 inp 


129 lat /82 inp 


thresh 


50000 nodes 


100000 nodes 


100000 nodes 


100000 nodes 




time 


mem 


time 


mem 


time 


mem 


time 


mem 


CLS 


174 


169 


253 


92 


2923 


265 


2271 


224 


SUB 


2763 


245 


568 


108 


Out 


Out 


Out 


Out 


PART 


299 


169 


457 


95 


3844 


219 


2550 


190 


PRIO 


244 


169 


309 


98 


2840 


219 


2359 


185 



Table 3. Performance comparison on verification problems 



Let us look first at the verification results in Table 3 (ckt 1^). We notice that CLS 
is still the fastest algorithm. However, PRIO comes close behind - for larger 
examples, it is only 10-20% slower and occasionally it beats the classic algorithm (as 
seen for ckt3). Also, PRIO has lower memory consumption and this trend gets more 
emphasized as we lower the threshold or run more challenging examples. This is 
similar to the time/memory trade-off observed in the usage of partitioned transition 
relations [20] compared to monolithic ones. 

PART takes more time to complete these examples, we suspect that PRIO wins 
over PART due to its mixed BFS/DFS nature that allows to reach deeper states faster 
and converge in less iterations. As for SUB, its dead-end resolution algorithm rarely 
succeeds to converge to full reachability and when it does is much slower than both 
PRIO and PART. This is due to the unbalanced partitioning used in SUB - the dense 
partition is explored first, but when it comes to traversing the non-dense one we are 
dealing with increasingly larger BDDs and a rapid degradation in performance. 

Table 4 reports falsification results on three buggy test cases (ckts 5-7). The time 
and memory data are measured only until the bug is encountered, due to the use of 
“on-the-fly” verification. As opposed to previous evaluations [1,13] of subsetting 
traversal that aimed at high state coverage, our evaluation criteria measures the 
efficiency and robustness of the different algorithms with respect to bug finding. 
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195 lat/67 inp 


129 lat/54 inp 


136 lat /73 inp 


thresh 


50000 nodes 


50000 nodes 


100000 nodes 




time 


mem 


time 


mem 


time 


mem 


CLS 


944 


234 


1307 


344 


4655 


494 


SUB 


Out 


Out 


810 


220 


Out 


Out 


PART 


1262 


155 


1012 


227 


5650 


210 


PRIO 


470 


128 


540 


157 


1600 


210 



Table 4. Performance comparison on falsification problems 





400 R. Fraer et al. 



PRIO is clearly faster than both CLS and PART, again due to its mixed BFS/DFS 
nature. It is no surprise that PART is even slower than CLS, since both do a BFS 
traversal only that PART does it slower and with lower memory consumption (just as 
noticed in Table 3). When comparing performance, there is no a priori winner 
between PRIO and SUB, but PRIO is definitely more robust. When SUB’s 
approximations miss the closest bugs, it is harder and harder to recover from dead- 
ends and to encounter the buggy states ignored previously. This is why SUB requires 
a high tuning effort - often only the combination of a specific subsetting heuristic and 
a specific threshold succeeds in finding the bug. By contrast, the usage of balanced 
partitioning in PRIO allows it to explore more states and eventually hit the bug. 
Different priority queues or different thresholds have a smaller impact on the chances 
of finding the bug, although they may affect the time required for it. 



4.3 Capacity Improvement 

One of the main benefits of PRIO is the capacity improvement over CLS, SUB and 
PART. This can be noted in Table 5 for both verification (ckt 8-10) and falsification 
problems (ckt 11-12). Both SUB and PART were run with the best tuning (i.e. 
different thresholds, approximation heuristics) and SUB actually succeeded to handle 
the two falsification problems, but only with a very specific configuration: the RUA 
approximation [9] and a threshold of 400000 nodes. 
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SUB 
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Out 


Out 


Out 


Out 


Out 


2236* 


41 
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59 


PART 


Out 


Out 


Out 


Out 


Out 


Out 


Out 


Out 


Out 


Out 


PRIO 


14768 


851 


54324 


237 


86522 


228 


512 


48 


739 


63 



Table 5. Capacity comparison on verification and falsification problems 



These results only confirm the trend noticed in section 4.2. While CLS had a slight 
edge for average verification problems, for the difficult examples PRIO has better 
chances for success than both SUB and PART. All the arguments mentioned above 
still apply here: PRIO wins against SUB due to the balanced partitioning, and is better 
than PART due to its mixed BFS/DFS strategy. 



5 Conclusions 



The ability of the same algorithm to address both verification and falsification 
problems is critical in an industrial context. The prioritized traversal proposed here 
achieves this goal by combining the best features of subsetting traversal [1] and 
partitioned traversal [13]. As in [1], the mixed BFS/DFS strategy makes the algorithm 
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efficient for falsification problems. As in [13], using balancing instead of density as 
the partitioning criterion makes partitioned traversal suitable for verification 
problems. The results reported on Intel designs show a marked improvement on 
existing exact and partial traversal techniques. 

Another important contribution of this paper is the fast splitting algorithm. As it 
addresses the general BDD decomposition problem, we suspect that such an 
algorithm might be useful for many other applications relying on BDD technology. In 
the specific context of partitioned traversal the speed of the algorithm outweighs the 
loss in the quality of the partitions, resulting in a significant reduction of the overall 
traversal time. 

Also it is worthwhile to note that the usage of prioritized traversal is not limited to 
reachability analysis and invariant checking. It can be easily adapted to other least- 
fixpoint computations, like the evaluation of CTL formulas of type EU or EF. Since 
the evaluation of such formulas involves a backward traversal of the state space, one 
only has to replace the image operator with the pre-image one in order to get the 
corresponding dual algorithm. 




402 R. Fraer et al. 



6 References 



[1] K.Ravi, F. Somenzi, “FTigh Density Reachability Analysis”, in Proceedings of ICCAD’95 

[2] M.Ganai, A Aziz, “Efficient Coverage Directed State Space Search”, in Proceedings of 
IWLS’98 

[3] J. Yuan, J.Shen, J.Abraham, and A.Aziz, “On Combining Formal and Informal 
Verification”, in Proceedings of CAV’97 

[4] C.Yang, D.Dill, “Validation with Guided Search of the State Space”, In Proceedings of 
DAC’98 

[5] A.Aziz, J.Kukula, T. Shiple, “Hybrid Verification Using Saturated Simulation”, In 
Proceedings of DAC’98 

[6] R.Bryant, “Graph-based Algorithms for Boolean Function Manipulations”, IEEE 
Transactions on Computers,C-35:677-691, August 1986. 

[7] K.L. McMillan. “Symbolic Model Checking”, Kluwer 1993. 

[8] T.R. Shiple “Formal Analysis of Synchronous Circuits”. PhD thesis. University of 
California at Berkeley, 1996. 

[9] K. Ravi, K.L. McMillan, T.R. Shiple, F. Somenzi “Approximation and Decomposition of 
Binary Decision Diagrams”, in Proceedings of DAC’98. 

[10] O. Coudert, J. Madre “A Unified Framework for the Formal Verification of Sequential 
Circuits”, in Proceedings of ICCAD’90. 

[11] K.Ravi, F. Somenzi, “Efficient Fixpoint Computation for Invariant Checking”, In 
Proceedings of ICCD’99, pp. 467-474. 

[12] B. Lin, R. Newton “Implicit Manipulation of Equivalence Classes Using Binary Decision 
Diagrams” in Proceedings of ICCD’91. 

[13] G.Cabodi, P.Camurati, S.Quer. “Improved Reachability Analysis of Large Einite State 
Machines” in Proceedings of ICCAD’96. 

[14] A.Narayan, J.Jain, M.Fujita, A.Sangiovanni-Vincentelli. “Partitioned ROBDDs - A 
Compact, Canonical and Efficiently Manipulable Representation for Boolean Eunctions” in 
Proceedings of ICCAD’96. 

[15] R.Eraer, G.Kamhi, L.Fix, M.Vardi. “Evaluating Semi-Exhaustive Verification Techniques 
for Bug Hunting” in Proceedings of SMC’ 99. 

[16] I.Beer, S. Ben-David, A.Landver. “On-the-Fly Model Checking” of RCTL Formulas”, in 
Proceedings of CAV’98. 

[17] G. Cabodi, P. Camurati, S. Quer, "Improving the Efficiency of BDD-Based Operators by 
Means of Partitioning," IEEE Transactions on CAD, pp. 545-556, May 1999. 

[18] F.Somenzi, “ CUDD : CU Decision Diagram Package - Release 2.3.0”, Technical Report, 
Dept. Electrical and Computer Engineering, University of Colorado, Boulder 

[19] R. H. Hardin, R. P. Kurshan, K. L. McMillan, J. A. Reeds and N. J. A. Sloane, “Efficient 
Regression Verification”, Int'l Workshop on Discrete Event Systems fWODES '96), 19-21 
August , Edinburgh, lEE, London, 1996, pp. 147-150. 

[20] J. R. Burch, E. M. Clarke, D. E. Long, K. L. McMillan, D. L. Dill, “Symbolic Model 
Checking for Sequential Circuit Verification”, IEEE Transactions on Computer-Aided 
Designs of Integrated Circuits and Systems, 401-424 Vol.l3, No. 4, April 1994. 




Regular Model Checking 



Ahmed Bouajjani^, Bengt Jonsson^, Marcus Nilsson*^, and Tayssir Touili^ 



^ Liafa, Univ. Paris 7, Case 7014, 2 place Jussieu, 75251 Paris Cedex 05, France 
{ Ahmed. Bouajjani .Tayssir ,Touili}@liaf a. jussieu.fr 
^ Dept, of Computer Systems, P.O. Box 325, S-751 05 Uppsala, Sweden 
{bengt, marcusn}@docs .uu. se 



Abstract. We present regular model checking, a framework for algo- 
rithmic verification of infinite-state systems with, e.g., queues, stacks, 
integers, or a parameterized linear topology. States are represented by 
strings over a hnite alphabet and the transition relation by a regular 
length-preserving relation on strings. Major problems in the verification 
of parameterized and infinite-state systems are to compute the set of sta- 
tes that are reachable from some set of initial states, and to compute the 
transitive closure of the transition relation. We present two complemen- 
tary techniques for these problems. One is a direct automata-theoretic 
construction, and the other is based on widening. Both techniques are 
incomplete in general, but we give sufficient conditions under which they 
work. We also present a method for verifying tj-regular properties of 
parameterized systems, by computation of the transitive closure of a 
transition relation. 



1 Introduction 

This paper presents regular model checking, intended as a uniform paradigm 
for algorithmic verification of several classes of parameterized and infinite-state 
systems. Regular model checking considers systems whose states can be repre- 
sented as finite strings of arbitrary length over a finite alphabet. This includes 
parameterized systems consisting of an arbitrary number of homogeneous finite- 
state processes connected in a linear or ring-formed topology, and systems that 
operate on queues, stacks, integers, and other data structures that can be repre- 
sented by sequences of symbols. 

Regular model checking can be seen as symbolic model checking, in which 
regular sets words over a finite alphabet is used as a symbolic representation of 
sets of states, and in which regular length-preserving relations between words, 
usually in the form of finite-state transducers, represent transition relations. This 
framework has been advocated by, e.g., Kesten et al. [KMM+97] and by Boigelot 
and Wolper [WB98], as a uniform framework for analyzing several classes of 
parameterized and infinite-state systems, where automata-theoretic algorithms 
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for manipulation of regular sets can be exploited. Such algorithms have been 
implemented, e.g., in the Mona [HJJ+96] and MoSel [KMMG97] packages. 

A major problem in regular model checking is that the standard iteration- 
based methods, used for finite-state systems (e.g., [BCMD92]), to compute, e.g., 
the set of reachable states, are guaranteed to terminate only if there is a bound 
on the distance (in number of transitions) from the initial configurations to any 
reachable configuration. An analogous observation holds if we perform a reach- 
ability backwards, from a set of “unsafe” configurations, as in [KMM+97]. In 
general, a parameterized or infinite-state system does not have such a bound. To 
explore the entire state-space, one must therefore be able to calculate the effect 
of arbitrarily long sequences of transitions. For instance, consider a transition of 
a parameterized system in which a process passes a token to its neighbor. The 
effect of an arbitrarily long sequence of such transitions will be to pass the token 
to any other process through an arbitrary sequence of neighbors. 

The problem of calculating the effect of arbitrarily long sequences of transi- 
tions has been addressed for certain classes of systems, e.g., systems with unbo- 
unded FIFO channels [BG96,BGWW97,BH97,ABJ98], systems with pushdown 
stacks [BEM97,Gau92,FWW97], systems with counters [BW94,GJ98], and cer- 
tain classes of parameterized systems [ABJN99]. A more uniform calculation 
of the transitive closure of a transition relation was presented in our previous 
work [JNOO] for the case that the transition relation satisfies a condition of 
bounded local depth. This construction was used to verify safety properties of 
several parameterized algorithms and algorithms that operate on unbounded 
FIFO channels or on counters. 

In this paper, we develop the regular model checking paradigm further. We 
give a simple and uniform presentation of the program model, which is simpler 
than that used in [JNOO]. The main part of the paper considers the problem of 
calculating the set of configurations reachable from a set of initial configurations 
via a transition relation, and the problem of computing the transitive closure of 
a transition relation. We present two complementary techniques to attack these 
problems: 

— The first technique is an automata-theoretic construction, which uses the 
standard subset-construction and minimization techniques from automata 
theory. The construction succeeds if the resulting automaton is finite-state. 
It generalizes the transitive closure construction in [JNOO]. 

— The second technique is based on computing fixpoints using widening, in the 
spirit of [GG77]. We propose exact widening techniques for regular langua- 
ges and show how to use them for reachability analysis and for computing 
transitive closures. 

As another main contribution, we show how to verify liveness properties (or, 
more generally, w-regular properties). When applied to parameterized systems, 
our method allows to prove, e.g., absence of starvation in resource allocation 
algorithms. In general, it allows to prove a liveness property, given an arbitrary 
number of parameterized fairness requirements. For instance, in a parameterized 
algorithm there is typically a fairness requirement associated with each process. 
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Our method is based on a reduction of the model-checking problem of w-regular 
properties to the problem of finding fair loops, which can be detected after having 
computed the transitive closure of a transition relation. 

Both the automata-theoretic and the widening-based techniques presented 
in this paper have been implemented and experimented on several examples of 
parameterized and infinite-state systems, including different parameterized algo- 
rithms for mutual exclusion (e.g., the Bakery algorithm by Lamport, Burns pro- 
tocol, Dijkstra algorithm), unbounded FIFO-channel systems (e.g.. Alternating- 
bit protocol), and counter systems (e.g.. Sliding Window protocol with unboun- 
ded sequence numbers). 

Outline In the next section, we present the framework of regular model checking, 
and define the verification problems that are considered in the paper. In Section 3 
we present an automata-theoretic construction for computing the transitive clo- 
sure of a transition relation. The construction can also be used to compute 
reachability sets. Sect. 4 presents widening techniques for the same problem. 
A method for model checking of w-regular properties is given in Sect. 5, and 
concluding remarks are given in Sect, 6. 

Related Work Previous work on the general aspects of regular model checking, 
and on analyzing classes of systems, e.g., pushdown systems, parameterized sy- 
stems, systems with FIFO channels, or with counters, has already been men- 
tioned earlier in the introduction. The acceleration techniques presented in this 
paper are able to emulate the acceleration operations for FIFO channels reported 
in [BG96]. 

Widening techniques have been introduced in [CC77] in order to speed up 
the computation of fixpoints in the framework of abstract interpretation. Many 
works have proposed widening operators for different kinds of systems, e.g., 
widening operators defined on representations based on convex polyhedra for 
use in the analysis of systems operating on integers or reals (e.g., [CH78,Hal93]), 
or widening operators on automata [LHR97]. All these techniques compute upper 
approximations of the desired fixpoint. 

Other researchers, e.g., [F097,Sis97], use regular sets in a deductive frame- 
work, where basic manipulations on regular sets are performed automatically. 
These methods are based on proving an invariant given by the user or by some 
invariant generation technique, but are not fully automatic. In [F097], the aut- 
hors show how to check that a given regular set is the reachability set of special 
kind of relations. We use their result to prove that our widening technique is 
exact. However, they do not provide any technique to find automatically the 
fixpoint . 

2 Preliminaries 

2.1 Model 

We introduce the program model used in regular model checking. 
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Let S he & finite alphabet. As usual, S* is the set of finite words over E. A 
relation R on S* is regular and length-preserving if w and w' are of equal length 
whenever (re, w') € R, and the set {(oi, a^) • • • (a„, a'^) : (oi • • • a„, • • • a^) G 

i?} is a regular subset of {E x E)*. In the following, we will implicitly under- 
stand that a regular relation is also length-preserving. A regular relation can be 
conveniently recognized by a finite-state transducer over (E x E). 

Definition 1. A program is a triple V = {E,(f>i,R) where 

A" is a finite alphabet, 

(pi is a regular set over E, denoting a set of initial configurations, and 
i? is a regular relation on E*, sometimes called the transition relation. □ 

A configuration w of a, program V is a word oi 02 • • • a„ over E. We denote 
by Rid = {(w,w') : w = w'} the identity relation on configurations. Regular 
relations can be composed to yield new regular relations. For two regular relati- 
ons R and R' , their union RUR', intersection RDR', sequential (or relational) 
composition R o Rf and concatenation R ■ R' are regular. 



2.2 Modeling Parameterized and Infinite-State Systems 

We give two examples of different classes of systems that can be modeled in the 
framework of the preceding subsection. More examples can be found in [JNOO]. 

Parameterized Systems. Consider a parameterized system consisting of an ar- 
bitrary number of homogeneous finite-state processes connected in a linear to- 
pology. A configuration of this system can be represented by a word over an 
alphabet consisting of the set of states for each process, where the length of the 
word equals the number of processes. A particular example of such a system is 
an array of processes that pass a token from the left to the right. Each process 
can be in one of two states, _L or t, where _L denotes that the process does not 
have the token, and t denotes that the process has the token. We model this 
system as the program V = {E, pi, R) where 

— E = {_L, t} is the set of states of a process, 

— pi = tE* is the set of initial configurations, in which the leftmost process 
has the token, and 

— R = (A, A)* • (t, A) ■ (A, t) ■ (A, A)* U (A, A)* • ((A, A) U(t, t)) ■ (A, A)* 
is the union of two relations, of which the first denotes the passing of the 
token from a process to its right neighbor, and the second denotes an idling 
computation step. Note that the transition relation is implicitly constrained 
by the invariant that there is exactly one token in the system. 

Systems Communicating over Unbounded FIFO Channels As another exam- 
ple, consider systems of finite-state processes that communicate over unbounded 
FIFO channels. Assume for simplicity that there is only one FIFO channel, con- 
taining a sequence of messages in the finite set M. Let Q be the finite set of 
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combinations of control states of the processes. A configuration of this system 
can be modeled as a word of the form 

q _L_L • • • _L miTO 2 • • • -L_L • • • _L 

where q € Q is the current control state, and mim 2 • • • is the sequence of 
messages in the channel. In order to allow messages to be added to and removed 
from the channel, we add an arbitrary amount of “padding symbols” _L before 
and after the sequence of messages. Thus, we can model this system as the 
program V = {S,4>i,R), where 

— A = QU7WU{-L}, 

— (f>i is the regular set qi _L* of configurations with no message in the channel, 
and the initial control state qi, 

— R is the union of several relations: one for each operation in the system. An 
operation which sends a message m G M while making a transition from 
control state q to q' is modeled by the relation 

{q,q') • (T,T)* • {RdDM^T- (T,m) • (T,T)* 

Receive operations are modeled in a similar way. 



2.3 Verification 

Let i? be a regular relation. We use to denote the transitive closure of R, 
and R* to denote the reflexive and transitive closure of R, and R~^ to denote 
the inverse of R. For a regular set (p of configurations, R{(j>) denotes the image 
{w' : 3w G 4>.{w,w') G i?} of (p under R. 

In this paper, we will consider two verification problems: 

— Computing Reachability Sets: Given a regular set <p of configurations and a 
regular relation R, compute the reachability set R*{p). This can be used for 
checking whether some configuration in a set (pp is reachable from a set <pi of 
initial configurations, either by checking whether pp n R*{pj) = 0 (forward 
reachability analysis), or whether pj fl {R~^)*{pp) = 0 (backward reach- 
ability analysis). If i? is a union R\\J- ■ - \J Rk of several regular relations, 
then one can also compute the set of reachable configurations stepwise, by 
repeatedly extending the set p of currently reached configurations by R* (p) 
for some i. 

— Computing Transitive Closure: Given a regular relation i?, compute its tran- 
sitive closure i?+. The transitive closure can be used for finding loops of 
parameterized systems. The relation fl Rid represents the identity rela- 
tion on the set of configurations that can reach themselves via a non-empty 
sequence of computation steps. To obtain the loops that can actually occur 
in an execution, we intersect this set with the set of reachable configurations, 
computed as a reachability set. The transitive closure can also be used for 
computing the reachability set. 
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A straight-forward approach to computing R*{4>) is to compute a sequence of 
unions Uf^Q for n= 1, 2, 3, . . . until a fixpoint is reached, i.e., U"^q = 

U"^Q^ -R* (</)) for some n. This approach is guaranteed to terminate for finite- 
state systems [BCMD92], but in general not for parameterized and infinite-state 
systems, as observed, e.g., in [ABJN99]. The analogous observation holds for 
the approach of computing through the sequence of unions i?* for n = 
1,2,3,... until convergence. 

We present two complementary techniques which are applicable to both of 
the above problems. 

— A direct automata-theoretic construction for computing R*{4>) or R^ , which 
is based on the transducer for R. It is presented in Section 3. 

— A technique based on widening, in the spirit of [CC77] is presented in Sec- 
tion 4. The technique is based on observing successive approximations to 
R*{4>), and trying to guess automatically the limit. The guess can always be 
checked to be an upper-approximation, and to be exact under some conditi- 
ons. The technique can also be applied to the computation of R* , by viewing 
it as a regular set over E x S. 

3 Automata Theoretic Construction of the Transitive 
Closure 



In this section, we present a technique for computing R~^, which attempts to 
compute a minimal deterministic transducer that recognizes R~^ using the stan- 
dard subset-construction and minimization techniques from elementary auto- 
mata theory. The technique can also, with obvious modifications, be used for 
computing R*{(j)). We will here concentrate on the transitive closure construc- 
tion, and only briefly summarize the reachability set construction at the end of 
this section. 

Our transitive closure construction is not guaranteed to terminate, in par- 
ticular if is not regular. However, if R has bounded local depth, a concept 
introduced in [JNOO], a slightly modified version of our construction will yield a 
finite transducer. 



3.1 The Transducer Construction 

For the remainder of this section, let i? be a regular relation on S, represented 
as a finite-state transducer R = (Q,qo,S,F) where Q is the set of states, qo 
is the initial state, S : (Q x (E x E)) i— >■ 2^ is the transition function, and 
F C Q is the set of accepting states. For each pair (w,w') in there is a 
sequence w^,w^, . . . ,w”^ of configurations such that w = w^, w' = w™ and 
€ R for 0 < i < m. Let w* be the word of length n. Then 

€ R means that there is a run q^ql ■ ■ ■ q^ of R which accepts the word 
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{a[ ^,a\) {a2 ^2) ’ ’ ’ (On Let us organize these runs into a matrix of 

form: 



9o ^ 9i ^ 4 ■■■ 4-1 — — ^ 4 

/•12\ /'12\ /12\ 

(ai,ai) (02,02) (0^,0^) 

9 o ^ 4 ^ <?2 • • • C -1 ^ 



/ m -1 m.\ 

(“1 i“i ) 



(“5 



<76 



9i 



92 • • • 9n-l 



/ m - 1 m \ 

(o^ , 0 ^ ) 



^ q 



m 

n 



with m rows, where each row shows a run of the transducer R with n transitions. 

The first step in our construction of R~^ is to regard the above matrix as 
a single run of another transducer, whose states are columns (i.e., sequences) 
of form q^4 ‘ ‘ ‘ <7™ for m > 1, and whose transitions represent the relationship 
between adjacent columns in a matrix of the above form. More precisely, define 
the column transducer for as the tuple {Q~^ , Qq , where 

— is the set of non-empty sequences of states of R, 

— Qq is the set of non-empty sequences of initial states of R, 

— A: {Q+x {SxS)) ^ 2^* is defined as follows: for any columns . . . t-™ 
and ■ ■ ■ g™, and pair (a, a'), we have r^r^ . . . -r™ g A(q^q^ ■ ■ ■ q"^, (a, a')) 
iff there are a°, a^, . . . , a™ with a = and a' = a™ such that r* G 
(5(g*, (a*“^, a*)) for 1 < 7 < m. 

It is not difficult to show that the column transducer for accepts exactly 
the relation R~^ . The only problem is that it has infinitely many states. We 
will therefore determinize it using the standard subset-construction, in the hope 
of decreasing the number of states. Let x, y range over columns and X, Y over 
sets of columns. The subset construction applied to the column transducer for 
i?+ yields the automaton, (2'3 , pA, T) whose states are sets of columns of 
states of R, whose initial state is the set g))" of columns of initial states, whose 
set T of accepting states is the set of states with at least one column in F+, 
and whose transition function pA is the lifting of Z\ to sets: pA(X, (a, a')) = 
Uxex A(x, (a, a')). We note that for each pair (a, a'), the relation {(x, y) : y £ 
A{x, (a, fo))} is regular, so that a transducer which implements the function pA 
can be constructed directly from the definition of A. It follows that if is a 
regular set of columns, then pA(X) is also a regular set of columns, which can be 
constructed by composing the automaton for X with that for pA and projecting 
onto the “next column” . 

In most cases, the subset construction does not yield a finite automaton. We 
therefore try to make it smaller by identifying equivalent sets of columns during 
the construction (cf. the minimization of deterministic finite automata). For a 
set X of columns (i.e., a state in the subset-automaton), let sufF(X) denote the 
set of suffixes of X, i.e., the set of words tt £ {E x E)* such that pA{X, tt) £ T . 
Two sets X,Y of columns are equivalent if suff(X) = sufF(T). We employ a simple 
(and incomplete) technique to detect equivalent sets, based on saturation. The 
basic idea is to extend (saturate) each set X of columns by additional columns 
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X such that sufF({a;}) C sufF(X). Hopefully, two equivalent sets of columns will 
become identical after saturation, in which case they will identified during the 
subset construction. 

Let us present the saturation rule. A state q in the original transducer i? is a 
copying state if sufF({(/}) C i.e., if its suffixes only perform copying of words. 
We use the following saturation rule:. 

— a xy & X or if xqqy G X, where x and y are (possibly empty) columns and 
<7 is a copying state, then add xqy to X. 

It is not difficult to see that suff({a;(7j/}) C sufF({a;y}) and that sufFdxgy}) C 
sufFdxggy}), implying the soundness of the saturation rule. Let [X] denote the 
saturation of X, i.e., the least set of columns containing X which is closed under 
the above rule. To saturate a regular set of columns is an easy operation on the 
automaton which represents the set. 

It follows from the above construction that the transducer obtained after 
saturation recognizes . We summarize the discussion in the following theorem. 

Theorem 1. Let R = {Q,qo,S,F) be a finite- state transducer. Then the deter- 
ministic (possibly infinite-state) transducer (Q, , Z\, IF), where 

— is the set of saturated regular subsets of , 

— A : {Q X {E X A')) 1 -^ Q is defined by A{X, (a, a')) = \pA{X, (a, a'))], 

— T are the sets with at least one sequence in 

accepts the relation □ 

It follows that if the set set of reachable states in the above automaton is finite, 
then R'^ is regular. We can then, using standard techniques, obtain a minimal 
deterministic finite-state transducer which recognizes i?’*'. 



3.2 A Sufficient Condition for Termination 

In [JNOO], it was shown that R~^ is regular under some sufficient conditions on a 
regular relation R. We will restate these conditions and note that the construc- 
tion given in the previous section, with slight modifications, will terminate under 
these conditions. It should be noted that these conditions are merely sufficient 
conditions for regularity, and that there are many other situations in which our 
construction of R^ yields a finite-state transducer. 

Let R be of the form ■ Ri ■ fin where 

— 4>l Q Rid is a left context, i.e., it copies a regular language which can be 
accepted by a deterministic finite automaton which has a unique accepting 
state, and which has no transitions from an accepting to a non-accepting 
state, 

— Ri is any regular relation, 

^ 4 >rQ Rid is a right context, i.e., the mirror image of some left context. 
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We say that R has local depth k if for each {w, w') £ there is a sequence 
w = w^, w^, . . . , w™ = w' such that for each 1 < i < to we can find indices U 
and Ui such that 

- (tti" ^ 2 "^ • • • a]~\ , a\a\ • • • aj._i) G </)l, 

- K;+i< 7 i 2 • • • . <,+!<, +2 • • • O G <t)R, and 

- ■ ■ ■ <7^ , 44+1 • • • <i) G Ri- 

where w* = a^a| • • • and such that for each position p (with 1 < p < n), there 
are at most k indices i with 1 < i < to such that U < p < Ui. Intuitively, the 
decomposition of R into (j)L- Ri- (j^R serves to structure i? as a local rewriting Ri 
in a “context”, represented by 7 ’l and A relation with local depth k never 
needs to rewrite any element of a word more than k times to relate two words. 

Theorem 2. [JNOO] If R has local depth k, for any k, the transitive closure 
is regular and can he recognized by a transducer of size exponential ink. □ 

The proof of the above theorem divides the set of columns into a finite set 
of equivalence classes. We will try to apply the same reasoning to the saturated 
columns in the construction of Theorem 1. 

Let a copied state be a state whose prefixes are a subset of the identity rela- 
tion, in analogy with copying states. The key of the proof is the observation that 
all sets X of columns generated in the subset construction satisfy the following 
closure property: 

- if xqy G X, where x and y are (possibly empty) columns and g is a copied 
state, then xzy G X for any z £ q*. 

It can be shown that the saturated columns corresponding to sequences of 
configurations satisfying the conditions of local depth k is built up from sets of 
the form 



Xoqo^lQl ■ ■ ■ Xn-lln-l^n 

with n < k and where each qi with 0 < i < n is a state in the transducer for Ri 

and each Xi with 0 < t < n is one of the following sets, where qr is a copied 

state and qn is a copying state: 

1- q*R 2. {qL + qn)* 3. {qr + Ir)* qR{qL + Ir)* 

4- te(<7L + Qr)* 5. {qL + qR)*qR 6. qniqL + qn)*qR 7. q^ 

To obtain the termination result, we therefore restrict the columns in the 
construction to have up to k states from the transducer for i?;. The conditions 
on R ensure that it is enough to consider such columns. 

3.3 Computing Reachable Configurations 

We will finally sketch the modifications for computing instead the set R*{p), 
where p is a, regular set of configurations. Assume that p is recognized by a 
finite automaton. In the construction of Section 3.1, a run of p will replace the 
transducer run in the first row of the matrix, with the obvious modifications. 




412 A. Bouajjani et al. 



The end result, corresponding to Theorem 1, is a (possibly infinite-state) au- 
tomaton that recognizes R*{(p), with the starting state where po is the 

starting state of an automaton that recognizes </> and qq is the starting state of 
a transducer for R. 



4 Widening Based Techniques 

We present in this section techniques for computing the effect of iterating a 
regular relation. These techniques are based on using exact widening operations 
in order to speed up the calculation of a regular fixpoint. 

Roughly speaking, our techniques consist in (1) guessing automatically the 
image of iterating a relation starting from some given regular set, and (2) deci- 
ding whether this guess is correct. 

The guessing technique we use consists in, given a relation R and a set 4>, 
comparing the sets </> and R{(j)) in order to detect some “growth” (e.g., R{(j)) = 
4>- A for some regular A), and then extrapolate by guessing that each application 
of R will have the same effect (e.g., we guess that the limit could be (p-A*). Then, 
we apply a simple fixpoint test which ensures (in general) that the guessed set is 
an upper approximation of the set i?* (</>). This means that using our widening 
techniques we can capture in one step (at least) all the reachable configurations 
from (phy R*. Moreover, we show that under some conditions on R, the fixpoint 
test allows in fact to decide the exactness of our guess (i.e., that the guessed set 
in precisely R*{(p)). 

4.1 Widening on Regular Sets 

Widening is applied during the iterative construction of the set of reachable 
configurations in order to help termination. Given a set of configurations <p C S* 
and a relation R, a widening step consists in guessing the result of iterating R 
starting from <p by comparing (p with R{(p) (in general, this guess can be made 
by considering the sets R^{(p) up to some finite bound k). Once this guess is 
made, the obtained set is added to the computed set of configurations and the 
exploration of the configuration space is continued. 

Let (p C S* and let i? be a regular relation on S*. Our widening principle 
consists in checking whether there are regular sets (pi, (p 2 , and A such that the 
following two conditions are satisfied 

Cl: (p = (pi ■ (p 2 and R{(p) = (pi ■ A - (p 2 , 

C2: (pi ■ A* ■ (p2 = R{(pi • A* ■ (p2) U (p. 

and, if Cl and C2 hold, in adding (pi-A*-(p 2 to the computed set of configurations. 

Intuitively, condition Cl means that the effect of applying i? to 0 is to “add” 
A between (pi and (p 2 - Notice that when (pi or (p 2 is equal to {e}, this corresponds 
respectively to the case where a growth occurs to the left or to the right of (p. 
Condition C2 implies that R*{(p) C (pi- A* -(p 2 . Indeed, C2 means that (pi- A* ■ (p 2 
is a fixpoint oi T = \X. (p U R{X) and R*{(p) is the least fixpoint of T . Hence, 
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by adding (pi ■ A* ■ (p 2 to the computed set of configurations, we capture at least 
all the reachable configurations from p by iterating R. 

The inclusion pi- A* -p 2 C R*{p) is not guaranteed in general by C2 (for any 
kind of relation R). Nevertheless, we show in the next section that for a large 
class of relations, condition C2 guarantees the exactness of our technique, i.e., it 
computes exactly the set R*{p). 

The application of the widening principle depends on the quality of the “au- 
tomatic guessing” part which is implicit in Cl. Given two regular sets p and 
p' = R{p), we have to find regular sets pi, p 2 and A such that Cl holds, and 
check that for these sets the condition C2 also holds. We use techniques on au- 
tomata allowing to extract from p and p' these sets. Roughly speaking, these 
techniques consist in finding cuts in the automata of p and p' that delimit pi , p 2 
and A. To reduce the number of choices to examine, we adopt a heuristic which 
consists in considering cuts at entering vertices to strongly connected compo- 
nents. This heuristic behaves well because almost always the automata we need 
are of simple forms (e.g., their loops are only self-loops) and the A is very often 
a nonempty word (or a sum of nonempty words -I- • • • -I- Wn), or a set of the 
form w ■ A' or A' ■ w where w is a nonempty word. 

4.2 Simple Rewriting Relations 

We introduce in this section a class of relations for which it can be shown that 
our widening technique is exact. 

A regular relation R is unary if it is a subset of A x A. A unary relation is 
acyclic if there is no a G A such that (a, a) € A regular relation R is binary 
if it is a subset of x A^. A binary relation i? is a permutation if it is a set of 
pairs of form {ab, ba) where a,b € S and afi^b.A permutation is antisymmetric 
if there are no a, 6 G A such that both {ab, ha) G R and {ba, ab) G R. 

A regular relation is defined to be simple if it is a finite union of relations of 
form pL ■ Ri ■ pn, such that 

1. In each sub-relation of form p^-Ri-pn, the “contexts” pL and pn are regular 
subsets of Rid, and Ri is either unary or binary, 

2. The union of all unary Rps is acyclic, 

3. The union of all binary Rps is antisymmetric. 

The class of simple relations is noncomparable with the class of relations 
of bounded local depth. We can now prove the following theorem, stating that 
conditions Cl and C2 give an exact guess for simple relations. 

Theorem 3. If R is a simple relation, and if there are regular sets pi, p 2 , and 
A such that conditions Cl and C2 are satisfied, then R* {p) = pi ■ A* ■ p 2 - 

To prove this theorem, we need to introduce the notion of noetherian relations. 
We say that a relation R is noetherian if for every word w G A* , there is no infinite 
sequence wo,wi,W 2 , ■ ■ ■ such that wq = w and for every t > 0, {wi,Wi+i) G R 
(i.e., no word can be rewritten an infinite number of times). Notice that a length 
preserving relation R is noetherian if and only if Rid H i?'*' = 0. We can prove 
the following fact: 




414 A. Bouajjani et al. 



Proposition 1. For every simple relation R, both R and R ^ are noetherian. 

Then, Theorem 3 can be deduced from Proposition 1 and the following result 
by Fribourg and Olsen on noetherian relations: 

Proposition 2 ([F097]). Let R be a relation. If R~^ is noetherian, then for 
every (f, 0' C if*, 4>' = R(4>') U </> z/ and only if </>' = R*{4>). 

4.3 Constructing Transitive Closures 

Given a length preserving relation R, widening can also be used to compute the 
transitive closure of R. Indeed, R can be seen as a language over S x S and R~^ 
can be computed as the limit of the sequence (i?*)i>o- Our procedure starts from 
R and computes iteratively the sets i?* for increasing i’s. Each step consists in 
applying the operation XX. Ro X. During this iterative computation, widening 
operations can be applied in order to jump to the limit. Now, from the definition 
of noetherian relations, it is easy to see that: 

Lemma 1. For every noetherian relation R, the relation {{{w,wi), (w,W 2 )) ■ 
W 2 € R(wi)} is noetherian. 

From Lemma 1 and Proposition 1, we deduce that our widening technique is 
also exact when computing the transitive closure of a simple relation R, starting 
from R and applying iteratively XX. RoX. Notice that R* can be computed in 
the same manner, starting from Rid instead of R. 

4.4 Example 

We have applied our widening-based techniques to several examples of parame- 
terized systems including the mutual exclusion protocols of Szymanski, Burns, 
Dijkstra, the Bakery protocol of Lamport, as well as the Token passing protocol. 
All these examples can be modeled as simple relations, and our procedure ter- 
minates for all of them and computes the exact set of reachable configurations. 
Moreover, our techniques allow also to construct the transitive closures of the 
relations modeling these systems. 

We show here the application of our techniques on the example of the token 
passing protocol described in section 2.2. It is easy to see that the relation R in 
this model is indeed simple. 

First, let us consider the problem of computing the reachability set. Our 
procedure starts from the initial set of configurations 4>i = tT* and computes 
the set R{4>i) = TtT*. At this point, it checks that condition Cl holds since 
R(tl.*) = A ■ tJ-* where A = T. Then, it checks that condition C2 also holds: 

U tF* = T*TM* U tF* = T*tT* 

Hence, we can apply an exact widening step by adding T*tT* to the set of 
reachable configurations. By doing this, our procedure terminates (since no new 
configurations can be added) and we get the result: 



R*{t±*) = ±*t± 
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Now, let us consider the problem of constructing the relation R~^ . Our pro- 
cedure starts from the relation 



R = {±,±r-{t,±)-{±,t)-{±,±r 

(We omit here the part of the relation corresponding to idle transitions.) The 
first step is to compute R^ which is: 

R^ = {±,±r-{t,±)-{±,±)-{±,t)-{±,±r 

Then, it can be checked that Cl holds because R = 4>i- 4>‘2 and R^ = 4>i ■ A ■ 4>2 
with 4>i = (T,T)* • (t, T), 4>2 = • (_L,T)*, and A = (T,_L). It can also be 

checked that C2 holds: 

(i? o (<(,1 . (T, ±y . 02)) U i? = 01 • (T, ±r ■ (T, T) • 02 U 01 • 02 

= 01 -( T , T )*-02 



and hence, our procedure gives the result: 

R+ = (T,T)* • (t,T) • (T,T)* • (T,t) • (T,T)* 

5 Model Checking of cj-Regular Properties 

In this section we will show how to reduce the problem of verifying a property 
specified by a Biichi automaton to the problem of computing the transitive 
closure. A related technique is presented by Pnueli and Shahar [PSOO]. Our 
technique is based on the observation that the problem of detecting infinite 
sequences reduces to that of detecting loops. This is true because the transition 
relation is length preserving which implies that each state, which is a word of a 
certain length, can only reach a finite set of states. For a program V = (A, (f>i,R), 
we can check for loops by checking the emptiness of the set 

i?*(0/) n i?+ n R,d 

We can use this idea to verify that a program satisfies an w-regular property 
under a set of fairness requirements, as follows. We use the standard techni- 
que [VW86] of encoding the negation of the property to be checked as a Biichi 
automaton. We also encode each fairness constraint as a Biichi automaton. Ac- 
tually, we can handle parameterized fairness requirements, using the position as 
the parameter: simply associate one Biichi automaton with each position in the 
word, which expresses the fairness constraint for that position. Now construct 
the product of the program with the Biichi automaton for the negation of the 
property, and the Biichi automata for the fairness requirements. We must now 
check whether this product has a reachable “fair loop” in which each Biichi 
automaton visits an accepting state. 

To check for fair loops, we first construct the set of reachable configurations 
of the product. Then, for each Biichi automaton, we add an observer. This is a 
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bit which can be initialized to false in a reachable state and thereafter detects 
whether the Biichi automaton has visited some accepting state during a sequence 
of transitions. More precisely, the transition relation is extended so that it sets 
an observer bit to true whenever the corresponding Biichi automaton reaches an 
accepting state; an observer bit can never become false after being set to true. 

Let Rang be the so constructed transition relation containing both Biichi 
automata and their observer bits. Fair loops can now be detected by checking 
whether relates a reachable state with all observer bits being false with 

the same reachable state but with all observer bits being true. 

We illustrate this method by verifying a liveness property for the token array 
system, given in Sect. 2.2, extended with fairness constraints. We will verify that 
every process eventually gets the token. The negation of this property is “some 
process never gets the token”, which can be expressed by a Biichi automaton 
accepting an infinite sequence of states of a process where the token is never 
obtained, i.e., an infinite sequence of the symbol _L. We encode this automaton 
by adding one boolean variable r which is true at the position at which the Biichi 
automaton reads symbols and by constraining the transition relation and the set 
of initial configurations so that 

— r is true at exactly one position in the word, 

— the truth value of r never changes in any position, and 

— the token is never passed to the position where r holds. 

We also impose for each process the fairness constraint that “the process may 
not hold the token indefinitely”. For each position, this fairness constraint can 
be expressed by the Biichi automaton 




These Biichi automata, one for each position in the word, are encoded by an 
extra variable s, initialized to si and ranging over the set of automaton states 
{si, S 2 }. The transition relation is extended so that the variable s simulates the 
state of the above automaton. Let P = {S,(pj,R) denote the resulting program 
obtained by adding the variables r and s as described above. 

Finally, a boolean observer Sobs is added to each position and the transition 
relation is changed so that Sobs becomes true when s becomes the accepting state 
si. Let Rang be the resulting transition relation. 

Let the variable d range over S, the alphabet of the program V. We can 
now check for fair infinite runs that violate the original property ( “every process 
eventually gets the token” ) by checking emptiness of the set 

R (0/) Fl Rang H (o = S A ~'Sobs A 
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Note that the set R*{4>i) can be computed as a reachability set. We have 
applied the construction given in Sect. 3 to construct a transducer for 
successfully in our implementation. 

6 Conclusions 

We have presented regular model checking, a framework for algorithmic verifica- 
tion of parameterized and infinite-state systems. To solve verification problems in 
this framework, we need to reason about the effect of iterating regular relations 
an unbounded number of times. It is well-known that the verification of safety 
properties reduces to reachability analysis. Moreover, the verification problem 
of any w-regular properties, including liveness properties, can be reduced to the 
construction of the transitive closure of a regular length-preserving relation. 

We have investigated properties of transitive closures, which lead to the de- 
velopment of new techniques for the construction of transitive closures. We have 
also presented widening-based techniques for constructing transitive closures and 
for reachability analysis. These techniques can be combined for computing sets 
of reachable configurations and transitive closures during verification. 
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Abstract. We address the problem of automatic analysis of parametric 
counter and clock automata. We propose a semi-algorithmic approach 
based on using (1) expressive symbolic representation structures called 
Parametric DBM’s, and (2) accurate extrapolation techniques allowing 
to speed up the reachability analysis and help its termination. The tech- 
niques we propose consist in guessing automatically the effect of itera- 
ting a control loop an arbitray number of times, and in checking that 
this guess is exact. Our approach can deal uniformly with systems that 
generate linear or nonlinear sets of configurations. We have implemented 
our techniques and experimented them on nontrivial examples such as a 
parametric timed version of the Bounded Retransmission Protocol. 

1 Introduction 

Counter automata and clock automata (timed automata) are widely used models 
of both hardware and software systems. A lot of effort has been devoted to the 
design of analysis techniques for these models (see e.g., [AD94,HNSY92,Hal93, 
BW94,BGL98,CJ98]). While the verification problem is undecidable in general 
for counter automata, this problem is decidable for timed automata [AD94], 
and there are model-checking algorithms and efficient verification tools for them 
[DOTY96,LPY97]. 

In this paper, we address the problem of analysing parametric counter and 
timed automata, i.e., models with counters and/or clocks that can be compared 
with parameters defined lower and upper bounds on their possible values. These 
parameters may range over infinite domains and are in general related by a set of 
constraints. We are interested in reasoning in a parametric way about the beha- 
viours of a system: verify that the system satisfies some property for all possible 
values of the parameters, or find constraints on the parameters defining the set of 
all possible values for which the system satisfies a property. These two problems, 
i.e., parametric verification and parameter synthesis, can be solved (in the case 
of safety properties) as reachability problems in parametric models. Unfortuna- 
tely, classical timed automata, where clocks can only be compared to constants, 
do not allow such a parametric reasoning. Moreover, it has been shown that for 
parametric timed automata, the reachability problem is undecidable [AHV93]. 



E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 419-434, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




420 A. Annichini, E. Asarin, and A. Bouajjani 



In this paper, we propose a semi-algorithmic approach that allows to deal 
with parametric counter and timed systems. We define new symbolic represen- 
tations for use in their reachability analysis, and provide powerful and accurate 
techniques for computing representations of their sets of reachable configura- 
tions. The representation structures we define are extensions of the Difference 
Bound Matrices that are commonly used for representing reachability sets of 
(nonparametric) timed automata [Dil89,ACD+92,Yov98]. Our structures, called 
Parametric DBM’s (PDBM’s) encode constraints on counters and/or clocks ex- 
pressing the fact that their values (and their differences) range in parametric 
hound intervals, i.e., the bounds of these intervals depend from the parameters. 
PDBM’s are coupled with a set of constraints on the parameters. Such Constrai- 
ned PDBM’s allow to represent linear as well as nonlinear sets of configurations. 
We show in the paper how the basic manipulation operations on DBM’s can be 
lifted to the parametric case, and then, we address the problem of computing 
the set of reachable configurations using Constrained PDBM’s. 

The main contribution of the paper is the definition of accurate extrapo- 
lation techniques that allow to speed up the computation of the reachability 
set and help the termination of the analysis. Our extrapolation technique con- 
sists in guessing automatically the effect of iterating a control loop (a loop in 
the control graph of the model) an arbitrary number of times, and checking, 
also automatically, that this guess is exact. More precisely, we can decide the 
exactness in the linear case and a subclass of the nonlinear case which can be 
reduced to the linear one. Hence, our extrapolation technique allows to gene- 
rate automatically the exact set of reachable configurations. Furthermore, the 
extrapolation principle we propose is simple and uniform for counter and clock 
systems, which allows to consider systems with both kinds of variables. Another 
feature of our techniques is that they allow the automatic analysis of systems 
that generate nonlinear sets of configurations, which is beyond the scope of the 
existing algorithmic analysis techniques and tools. 

We have implemented a package on Constrained PDBM’s as well as a re- 
achability analysis procedure based on our extrapolation techniques. We have 
experimented our prototype on nontrivial examples including systems generating 
linear sets of reachable configurations, as well as systems generating nonlinear 
sets of constraints. In all these examples, our analysis procedure terminates and 
generates the exact set of reachable configurations. These experiments show that 
our approach is powerful and accurate. In particular, we have been able to verify 
automatically a parametric timed version of the Bounded Retransmission Proto- 
col (BRP) [HSV94]. The model we consider is a parametric timed counter system 
where parameters are constrained by nonlinear formulas (defined in [DKRT97]). 



Outline: In Section 2, we give some basic definitions and introduce the kind of 
constraints and operations we use in our models. In Section 3, we introduce the 
parametric counter and timed systems. In Section 4, we introduce the PDBM’s 
and the basic operations we consider on these structures. In Section 5, we define 
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extrapolation techniques and show their use in reachability analysis. In Section 
6, we discuss the current status of our implementation and experiments. 

Related Work: The (semi-)algorithmic symbolic approach have been used for 
counter automata and timed systems in many works such as [CH78,HNSY92, 
Hal93,BW94,HHWT95,BGL98,BGP98,CJ98]. However, none of the existing works 
can deal with systems with nonlinear sets of reachable configurations. 

Our extrapolation techniques have the same motivation as the widening ope- 
rations [GH78,BGP98] used in the framework of abstract interpretation [GG77], 
and the techniques based on the use of meta-transitions [BW94,GJ98]. The aim 
of all these techniques is to speed up the computation of the reachable configu- 
rations and help the termination of the analysis. However, the existing widening 
techniques compute upper approximations using convex polyhedra. Our tech- 
niques are more accurate since they compute the exact effect of iterating an 
operation (control loop) an arbitrary number of times. Hence, our techniques 
are similar from this point of view to the techniques based on computing meta- 
transitions such as [BW94]. Gompared with the technique of [BW94] for instance, 
our technique can sometimes detect “periodicities” more efficiently because it ta- 
kes into account the set of configurations under consideration. 

Furthermore, our extrapolation techniques are based on a principle of gues- 
sing the effect of the iterations which is in the same spirit as the principle of 
widening. But our principle differs technically from it and also differs from wi- 
dening because we can check that our guess is exact. The principle of checking 
the exactness of a guess has been used in [F097] in the context of system on 
strings. However, the techniques in [F097] are different from ours, and [F097] 
does not address the question of making a guess. 

The problem of verifying the BRP has been addressed by several resear- 
chers [HSV94,GdP96,Mat96,DKRT97]. However these work either provide ma- 
nual proofs, or use finite-state model-checking on abstract versions of the proto- 
col or for particular instances of its parameters. In [AAB99a,AAB+99b] we have 
verified automatically an infinite-state version of the protocol with unbounded 
queues. However, we have considered in that work an abstraction of the clocks 
and counters and we have ignored the timing aspects that are addressed in this 
paper. In [DKRT97] the constraints that must be satisfied by the parameters are 
investigated. Then, their automatic verification using Uppaal is done only for a 
finite set of particular values satisfying these constraints. As far as we know, our 
work is the first one which allows to check automatically that these (nonlinear) 
constraints indeed allow the BRP to meet its specifications. 



2 Preliminaries 

Let A be a set of variables and let x range over X. The set of arithmetical terms 
over X, denoted AT{X), is defined by the grammar: 



t ::= 0 \ 1 \ X \ t — t \ t 1 \ t * t 
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The set of first-order arithmetical formulas over X, denoted FO{X), is defi- 
ned by the grammar: 



(p ::= t < t \ ~^<j) \ (py <p\ 3 x. p I IsJnt{t) 

Formulas are interpreted over the set of reals. The predicate IsJnt expresses 
the constraint that a term has an integer value. 

The fragment of FO{X) of formulas without the IsJnt predicate is called the 
first-order arithmetics of reals and denoted RFO{X). The fragment of FO{X) 
of formulas without multiplication (*) is called the linear arithmetics and is 
denoted LFO{X). It is well-known that the problem of satisfiability in FO{X) 
is undecidable, whereas it is decidable for both fragments RFO{X) and LFO{X). 

Let P be a set of parameters. Then, a simple parametric constraint is a 
conjunction of formulas of the form x^t or x — y^t, where x,y G X, -<G 
{<,<}) and t G ATfiP). We denote by SC{X,V) the set of simple parametric 
constraints. 

Each simple parametric constraint defines a family of convex polyhedra with 
parametric bounds. These polyhedra are of a special kind called zones in the 
timed automata literature. Notice that if all the bounds in a simple constraint are 
parameter-free terms (i.e., they represent constant values), then this constraint 
defines a unique convex polyhedron (zone). 

We consider simple operations on variables corresponding to special kinds 
of assignments. We allow assignments of variables that are either of the form 
x := y -\- 1 or of the form x := t, where x,y G X are variables (x and y may be 
the same variable), and t G AT(P). 

3 Parametric Timed Counter Systems 

A Parametric Timed System (PTS) is a tuple T = (Q, C, P, I, 5 ) where 

— (5 is a finite set of control states, 

— C = {ci, . . . , c„} is a finite set of clocks, 

— P is a finite set of parameters, 

— I : Q ^ SC{C,P) is a function associating invariants with control states, 

— (i is a finite set of transitions of the form {qi,g, sop,q2) where qi,q2 G Q, 

g G SC {C, P) is a guard, and sop is a simple operation over C. 

Clocks and parameters range over a set E) which can be either the set of 
positive reals IR-^ (dense-time model) or the set of positive integers IN (discrete- 
time model). Parameters can be seen as variables that are not modified by the 
system (they keep their initial values all the time). A configuration of T is a 
triplet {q, v, 7) where q G Q, ir : C ^ ID is a valuation of the clocks, and 
7 : P — >■ ID is a valuation of the parameters. 

Given a transition r = {q\, g, sop,q2) G S, we define a transition relation 
-G-r between configurations: (qi,i^i,ji) -G-r {12,1^2,12) iff H 9 ^2 = 

sop(i'i). We also define a time-transition relation between configurations: 




Symbolic Techniques for Parametric Reasoning 



423 



(gi, (< 72 ,J^ 2 , 72 ) iff qi = <?2 and 3r & E). V 2 = v\ + r and Vr' < 

r. {vi +r', 7 ) h I{qi)- 

Let T G (5 and let S be set of configurations. Then, we define postr{S) to be 
the set {a : 3a' £ S. a' o — o cr}, and post{S) = {J^^gpostri^)- Given 
a sequence of transitions 9 = ti, . . . ,t„, we define postg = postr„ o ■ ■ ■ o postr^ . 

A Parametric Counter System (PCS) is a tuple C = {Q,X,P,S) where X is 
a set of integer valued variables (counters), and Q, P, and S are defined in the 
same manner as for PTS’s (substitute C by X in the definition of 6). 

A configuration of C is a triplet {q, v, 7 ) where q £ Q, v : X ^ IN, and 
"f : P ^ IN. Given a transition t £ 6, we define a relation — in the same 
manner as for PTS’s. The function postr here is defined without considering 
time-transitions. 

We can define also parametric models M = {Q,C,X,P,I,S) having both 
counters and clocks by a straightforward extension of the definitions of the PTS’s 
and PCS’s. We do not allow comparisons between clocks and counters in the 
guards and the invariants. We call these models Parametric Timed Counter 
Systems (PTCS’s). 

4 Symbolic Representation Structures 

4.1 Parametric Difference Bound Matrices 

To simplify the presentation, we consider here only the case of PTS’s. The tre- 
atment of counters is analogous since we have the same kind of guards and ope- 
rations on counters as on clocks. We introduce representation structures for sets 
of configurations of PTS’s that are extensions of the Difference Bound Matrices 
used for representing reachability sets of (nonparametric) timed automata. 

Let T = (Q, C, P, 1, 5) be a PTS, let C = {ci, . . . , c„} be the set its clocks, 
and let Cq be an additional clock whose value is always equal to 0. Then, any 
simple parametric constraint can be represented by a (n-l- 1) x (n-l- 1) matrix M 
of elements in AT{P) x {<,<}, where each entry M{i,j) = (t, ^) encodes the 
constraint Ci — Cj -< t. We call such a matrix M a Parametric Difference Bound 
Matrix (PDBM). 

A parameter constraint is a quantifier-free formula in FO{P). A Constrained 
PDBM is a pair {M,d>) where M is a PDBM and ^ is a parameter constraint. 
A symbolic configuration is a pair {q, {M, <P)) where q £ Q is a control state, and 
(M, <P) is a Constrained PDBM representing a set of clock and parameter values. 



4.2 Basic Operations on Constrained PDBM’s 

We define operations for manipulating Constrained PDBM’s by lifting all the 
standard operations on DBM’s [Dil89,ACD+92,Yov98] to the parametric case. 
The operations that are worth discussing are: transformation into a canonical 
form (which is also used for emptyness check), intersection (used with guards 
and invariants when computing sets of successors), and inclusion test. 
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Canonical form Different constrained PDBM’s can represent the same para- 
metric set of configurations due to the fact that some of the bounds may not 
be tight enough. For instance, consider the constrained PDBM corresponding to 
the set of constraints: 

S'=(0<a;<p2A0<7/<piA0<a; — y<0,0<pi < P 2 ) 

It can be seen that if the upper bound on x (i.e., P2) is replaced by pi, then we 
obtain another representation of the same parametric set of configurations as S. 
This representation corresponds to the canonical form of S. 

We recall that in the nonparametric case, canonical forms of DBM’s are 
constructed using the Floyd Warshall algorithm which computes the minimum 
path between all pairs of entries. In the parametric case we consider here, we 
follow the same principle by running a symbolic Floyd Warshall algorithm. Du- 
ring its execution, this algorithm needs to determine minimums between terms 
built from those appearing in the original matrix. (We omit here the technical 
discussion about how to deal with strict vs. nonstrict inequalities.) For that, the 
algorithm assumes each of the two possible cases and check their consistency 
w.r.t. the parameter constraints: given two terms ti and t2, it considers the case 
where min{ti,t2) = (resp. min{ti,t2) = ^2) and adds ti < t2 (resp. ti > ^2) in 
the parameter constraints, and then it delivers the consistent cases among these 
two, may be both of them. (We address below the decidability of this consistency 
check.) 

For instance, the canonical form of S can be easily computed in this manner; 
the case splitting gives two cases but one of them is inconsistent (pi > ^2)- 
However, if we remove pi < p2 from S, we obtain two constrained PDBM’s 
corresponding to each of the possible cases (pi < P2 or pi >P2)- Notice that the 
construction of canonical forms allows also to test the emptyness of constrained 
PDBM’s. 

Now, in order to check the consistency of each of the possible cases when 
computing the minimum between two terms, we have to test the satisfiability of 
formulas (j) of the form 

<P{P) A ti A <2 

where AG {<,<} and ^ is a parameter constraint. If is in LFO{P) (linear 
constraint) or in RFO{P) (all parameters are reals), then this test is decidable. 

If is a nonlinear formula of FO{P) mixing integer and real parameters, this 
test is of course undecidable. Nevertheless, we still can test safely the satisfiability 
of 4 > in RFO{P) (i.e., we check the satisfiability of (p under the assumption that 
all the parameters are reals). If p in not satisfiable in RFO{P), we are sure that 
it is not satisfiable for its original interpretation in FO{P). However, (p could be 
satisfiable in RFO{P) whereas there are no integer valuations of P satisfying 
it. Hence, by interpreting formulas of FO{P) in RFO{P), we consider upper 
approximations of the sets of possible configurations. 

Intersection Given two constrained PDBM’s = (Mi,<?i) and S'2 = (M2, ^2)) 
the intersection of Si and S2 is represented in general by a set of constrained 




Symbolic Techniques for Parametric Reasoning 



425 



PDBM’s. Roughly speaking, the construction consists in computing for every 
i and j, the minimum between the two terms Mi(i,j) and M2{i,j), under the 
parameter constraints A <p2- This is done by case splitting and checking the 
consistency of each case, as in the construction of canonical representations ex- 
plained above. 

Again, checking the consistency of the different cases produced by case split- 
ting is decidable if the parameter constraints are in LFO{P) or RFO{P). Hence, 
in this case the construction of the intersection is exact. In the general case, 
checking satisfiability in RFO{P) instead of FO{P) is a safe consistency test 
that yields an upper approximation of the intersection. 



Test of inclusion Let and S2 = (M2, ^2) be two canonical 

constrained PDBM’s. The inclusion of S\ in S2 can be expressed by the following 
formula ip: 

VP. <l>i(P) A <?(P) ^ Ml < M2 

The validity of ip is decidable if it is in LFO{P) or in RFO{P). Otherwise, 
we have a safe test of inclusion by checking the validity of ip in RFO{P). Indeed, 
if Ip is valid RFO{P), then it is also valid in FO{P) and hence, if our inclusion 
test answer positively, we are sure that it is true. However, if ip is not valid, it 
does not mean that Si is not included in 82- 



5 Reachability Analysis 

5.1 Building Symbolic Reachability Graphs 

Let P be a PTCS. We present a procedure which, given a symbolic configuration 
S, computes a representation of the set post*{S). For that, starting from S, we 
construct a symbolic reachability graph where each vertex is a symbolic confi- 
guration and edges correspond to transitions of P. The vertices of the symbolic 
graph are treated according to a depth-first traversal. The construction stops 
when each symbolic configuration that can be generated is covered by (included 
in) some symbolic configuration that has been already computed. During this 
construction, we use extrapolation in order to help termination. 

Our extrapolation technique is based on guessing automatically the effect of 
iterating an arbitrary number of times a control loop (cycle in the control graph 
of P), starting from a given symbolic configuration, and checking that this guess 
is exact (does not introduce nonreachable configuarations) . Informally, we can 
present our extrapolation principle as follows: Let S' be a symbolic constraint 
and let 0 be a control loop, and suppose that the difference (in a sense which 
will be defined later) between postg{S) and S, say A, is equal to the difference 
between postg{S) and postg{S). Then, we suspect that the effect of iterating 6 
will be to add at each step the same A to the original set, i.e., after n iterations, 
the set of reachable configurations will be roughly S+nA (the precise set is given 
below) . Roughly speaking, our technique consists in guessing that a control loop 
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6 (which is may be a composition of several simple loops) defines a periodic 
operation starting from a particular set of configurations. In many cases, our 
guess is exact. Moreover, the exactness of the effect obtained by extrapolation 
can be expressed as an arithmetical formula (we discuss later the decidability 
issue of this check). 

Notice that our extrapolation technique introduces new parameters (n) cor- 
responding to numbers of iterations of control loops. In order to deal with sets 
represented by means of such variables, we have to extend our symbolic repre- 
sentations and introduce open constrained PDBM’s. 

Let us call iteration parameters these auxiliary variables and let N be the set 
of such variables. In order to deal with sets represented using iteration variables 
we have to extend our symbolic representation and introduce open PDBM’s. 



5.2 Open Constrained PDBM’s 

Let IV be a (countable) set of iteration variables. An open PDBM is a PDBM 
such that its elements are terms are in AT{P\J N) x {<,<}. We extend also the 
definitions of Constrained PDBM’s and symbolic configurations by considering 
that the terms appearing in parameter constraints are in AT{P U N). 

Now, let us see how to extend the operations on PDBM to open PDBM’s. The 
construction of canonical forms as well as intersection can be done as previousely. 
The problematic operation if the test of inclusion. Indeed, given two canonical 
constrained open PDBM’s S\ = (Mi,^i) and S2 = (M2, ^2), the inclusion of 
Si in S2 can be expressed by the formula tp: 

VP. VW {IsJnt{N)A$i{P,N) ^ 3N'. IsJnt{N')A^2{P,N')AMi{N) < M2{N') 

When is a linear formula (a LFO formula), the validity of ip is decidable, 
and hence, the inclusion problem between constrained open PDBM’s is decidable 
in this case. 

Another interesting case is what we call the half-linear case which corre- 
sponds to the following situation: using quantifier elimination, the obtained for- 
mula from Ip after eliminating all the real valued parameters in P is a linear 
formula on N U IV'. The validity of this formula can be checked since LFO on 
integers (Presburger arithmetics) is decidable. The elimination of the real valued 
parameters can be done automatically using the techniques of quantifier elimi- 
nation in RFO (we do not need to assume that N and N' are sets of integer 
variables). 

Using this technique, we can deal with significant cases of systems generating 
nonlinear sets of configurations. For instance, in the analysis of the Bounded 
Retransmission Protocol, all the inclusion tests are half-linear. 

Beyond the class of half-linear systems, the test of inclusion is undecidable. 
Nevertheless, even in this general case, it is possible to have a safe test of inclu- 
sion. However, we cannot adopt the naive approach which consists in checking 
the validity of ip in RFO since ip has an alternation of universal and existential 




Symbolic Techniques for Parametric Reasoning 



427 



quantification. Then, the solution we propose is to define a formula ip' in RFO 
which is “reasonably” stronger than ip. This formula is: 

VF. ViV. <Pi{P,N) ^ 

\N I 

3N'. yN". /\ \N' - N''\ < ^ ^ ^2{P, N") A Mi{N) < M2{N") 

i=l 

The idea is to require that for every real vector N, there is a real vector N' such 
that Ml < M 2 holds for all the real vectors N" in a neighborhood of N' which 
contains at least one integer vector. Thus, if ip' holds, necessarily ip holds too. 



5.3 Extrapolation 

We present hereafter our extrapolation principle. We need first to introduce some 
notations. 

Let T = (Q, C, X, P, 1, 5) be a PTCS. A control loop is a cycle in the graph 
(Q,S), i.e., a path {qi,gi,sopi,q'^) . . .{q„,g„,sopn,q'„) such that qi = q'„ and 
Vi G n- 1}, g- = q^+i. 

Given an open PDBM M (resp. symbolic constraint S = (M, ^)), we denote 
by Iter{M) (resp. Iter{S)) the set of iteration variables appearing in M (resp. 
in M or ^). Let S = be a constrained PDBM. Given n G N such that 

n ^ Iter{S), we denote by Spn the constrained PDBM A n > 0). Given 

a PDBM M' such that Iter{M') C Iter{S), we denote by S' + M' the symbolic 
constraint (M + 

Now, let 0 be a control loop and let {q, S) be a symbolic configuration, 
where S = (M, <P) is a constrained PDBM. Then, suppose we have compu- 
ted Si = and S 2 = (M 2 ,<? 2 ) such that (g, Si) = postg{q,S) and 

(g, S 2 ) = postg{q, Si). Let A = Mi — M and A' = M 2 — Mi. Our extrapo- 
lation principle consists in checking whether the two following conditions hold: 

Gl: VP. ViV. IsJnt{N) A ^ 2 {P, N) ^ A = A', 

G2: Vn > 0. postg{q, Sfn +nA) = postg{q, Sfn +{n + 1)A), 

and, if Gl and G2 hold, in adding postg{q, Spn +nA) to the computed set of 

reachable configurations, and the edge ( 9 , Si) — >■ postg{q, Spn +nA) to the 
symbolic graph. 

Gondition Gl says that the effect of 9 after two iterations is to add A at 
each step. Notice that we check the equality of the two matrices M 2 — Mi and 
Ml — M under the constraint ^ 2 - This constraint is stronger than <?i which is 
itself stronger than <1> due to the fact that each application of 9 may introduce 
but never remove parameter constraints. Gondition G2 allows to check that each 
application of 9 has an effect of adding A, provided the guards and the invariants 
in 9 are satisfied. In order to take into account the guards and invariants, we 
compute the effect after n+1 iterations of 9 as the post^-image of (g, Afn -l-nZ\) . 
We can prove, by straightforward inductions, the following fact: 
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Lemma 1 Let 9 he a control loop, let {q, S) he a symholic configuration, and let 
A he a PDBM. Then, the two following formulas are equivalent: 

1. Vn > 0. postg{q, Sfn +nA) = postg{q, S'tn +(n + l)A), 

2. 'in > 0. postg~^^{q, S) = postg{q, Sfn +nA). 

By Lemma 1, we can deduce that when it can be applyed, our extrapolation 
principle is exact (it computes precisely the set postg^^{q, S), for any n > 0). 

Both conditions Cl and C2 correspond to arithmetical formulas. These con- 
ditions are of course decidable in the linear case, and they are also decidable in 
the half-linear case, i.e., after elimination of the real- valued parameters in P, the 
obtained formula is linear. Hence, we have an exact extrapolation technique in 
these case. 

In the general (nonlinear) case, the test of exactness C2 is actually not rele- 
vant, since we can only compute upper-approximations of the reachability set in 
this case. So, the extrapolation principle we apply in this case is a weak extra- 
polation principle which consists in checking condition Cl only. 

Actually, even in the linear and half-linear cases, it is often not necessary to 
check the condition C2. It can be observed that the weak extrapolation principle, 
even if it not guaranteed to compute the exact reachability set (it computes 
and upper-approximation of it in general), it is more accurate than existing 
widening operators [CH78,Hal93] since it allows to capture periodicities (see 
the examples in Section 5.1). We have used this principle to analyse several 
examples of parametric counter and timed systems and in all these cases, our 
procedure was able to compute the exact set of reachable configurations. In fact, 
we can prove that for an important class of systems, the weak extrapolation 
principle is exact. This class includes many of the usual examples encountered 
in the literature (Bakery algorithm, lift controler, etc). For lack of space, we omit 
addressing this issue in this version of the paper. 



5.4 Examples 

Let us illustrate the use of our reachability analysis techniques on small examples. 



A simple linear system: Let us consider first a very simple counter system 
which is described in Figure 1. In this example, x is counter and T is a parameter. 



X <Tjx ■.= X + 2 




Fig. 1. Linear counter system 
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We suppose that the initial value of a; is 0 and that T is not contrained. So, 
the initial symbolic configuration is (0,true). The first execution of the unique 
transition 9 of the system creates the edge: 

(0,true) (2,0 < T) 

since x is incremented by 2, and before that (when its value was 0), it was 
compared to T. The second iteration creates the edge 

(2,0 < T) A (4,2 < T) 

At this point, we can check that condition Cl holds since the effect of the first 
and second iteration of 6 is to add the same value 2 to x. Then, by the weak 
extrapolation principle, the following edge is created 

(2, 0 < T) postg((0, 0 < n) + 2n) = postg{2n, 0 < n) 

= (2n + 2, 0 < n A 2n < T) 

Notice that the application of 9 to the symbolic configuration (2n,n > 0) 
allowed to generate the constraint 2n < T relating n with T. This illustrates 
how guards (and also invariants in the case of timed systems) are taken into 
account in the extrapolation technique. 

It can be easily checked that condition C2 also holds. Indeed, we have 

postg(2n, 0 < n) = (2n + 4, 0<nA2n + 2<T)= postg{2n + 2, 0 < n) 

and thus, we are sure that our extrapolation is exact. 

It worths noting that our extrapolation principle can generate periodic sets 
whereas classical widening techniques [CH78,Hal93] will not. 

A simple half-linear system: Now, let us consider the parametric timed 
automaton given in Figure 2, where c and c' are real-valued clocks, and T and 
T' are real-valued parameters. Let us assume that the initial configuration is 




Fig. 2. Half-linear clock system 



(c = 0, c' = 0, 0 < T A 0 < T'). Then, the first application of the transition 9 of 
the system gives: 

(0, 0, 0 < T A 0 < T') {T', 0, 0 < T A 0 < T') 
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and the second iteration gives: 

(T', 0, 0 < T A 0 < T') (2T', 0, 0 < T' < T) 

Then, the condition Cl of our extrapolation principle holds with A(c) = T' and 
A(c') = 0. Following the weak extrapolation principle, we create a 0*-edge from 
(T', 0, 0 < T A 0 < T') to poste{n *T',0,0<nA0<TA0< T') wich is equal 
to: 

((n + 1) * T', 0,0 < n A 0 < T' A n * T' < T) 

The generated symbolic configuration is nonlinear, but still, all decision problems 
we need on it are decidable due to the fact that they are half-linear. For instance, 
after generating this symbolic configuration, we need to check for its emptyness 
by deciding whether the parameter constraints are satisfiable. This is very simple 
in this example because after eliminating the parameters T and T' (supposed to 
be real valued), we get a trivial linear constraint on n which is n > 0. It can also 
be seen that in this case, the condition C2 (exactness of the extrapolation) can 
be done straigthforwardly. 



A complex half-linear system: We consider here an example which is inspired 
from the model of the Bounded Retransmission Protocol. It consists of a systems 
with two clocks c\ and C 2 and a counter x (see Figure 3). Intuitively, c\ represents 



Cl = Ti A a; < M/ 
Cl := 0; a: := a; -I- 1 




Cl < Ti 



Fig. 3. The Nonlinear Kernel of the BRP 



the clock of a sender and C 2 the clock of a receiver. These clocks are compared 
with parametric bounds T\ and T 2 supposed to be real values. The counter x 
counts the number of times the loop 9 on the state q is performed. The transition 
9 corresponds in the BRP to a retransmission action by the sender. The number 
of these retransmissions is bounded by M which is an integer parameter. We 
assume that the parameters Ti, T 2 , and M are related by a nonlinear constraint 

T 2 ^ Af ^ Tf 

which means that the timeout of the receiver is at least M (the maximum number 
of retransmissions) times the timeout of the sender. The question is whether the 
state qi (considered as a bad state) is reachable under the constraint above. 
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Roughly speaking, this correponds to a property of synchronisation between 
the sender and receiver in the BRP: The timeout of the receiver should not 
expire before the sender has finished all his possible retransmissions. Notice that 
the constraint above is not precisely the one considered on the timeouts in the 
BRP [DKRT97], but our aim here is just to show simply and faitfuly the main 
problems that appear in the analysis of a complex system such as the BRP. 

Let us compute the reachability set starting from the initial configuration: 

0 < Cl < Ti A 0 < C2 < Ti A Cl — C2 = 0 A X = 0, 

0 ^ Ti A 0 ^ T 2 A 0 ^ ilL A T 2 ^ ^ T\ 

The two first applications of 9 give successively 

0 < Cl < Ti A Ti < C2 < 2Ti A Cl — C2 = Ti A X = 1, 
0<riA0<r2A0<MAT2>M*Ti 

and 



0 < Cl < Ti A 2Ti < C 2 < 3Ti A Ci — C 2 = 2Ti A x = 2, 

0 ^ Ti A 0 ^ T 2 A 1 < Tf A T 2 ^ ^ Tj 

At this point, it can be checked that the extrapolation condition Cl holds with 
Z\(ci) = 0, A(c 2 ) = Ti, A(ci — C 2 ) = Ti, and A(x) = 1. (Notice that, in general, 
the differences between lowers bounds and upper bounds could be different, and 
thus, A may correspond to nondeterministic increasings represented by intervals 
instead of precise values as in this example and the previous ones.) Then, by the 
weak extrapolation principle we can create the configuration: 

0 < Cl < Ti A (n + 1) * Ti < C 2 < (n + 2) * Ti 

A Cl — C 2 = (n + 1) * Ti A X = n + 1, 
0<nA0<TiA0<T2An<MAT2>M*Ti 

The transition to the state q\ is executable if the following set of parameter 
constraints (obtained by intersection with the guard) is satisfiable: 

0 < nAO < TiAO < T 2 An+l < MA(n+l)*Ti < T 2 < (n+2)*TiAT2 > M *Ti 

The formula above is actually half-linear because after the elimination of the 
real parameter T 2 we obtain the constraint 

0<nA0<TiAn-|-l<MA(n-|-2)*ri>M*Ti 

and after the elimination of Ti we obtain the formula: 

0<nAn-|-l<MAM <n-|-2 

This formula is a linear constraint on integer variables, and thus its satisfiabi- 
lity can be decided. Clearly the formula above is unsatisfiable since there is no 
integers strictly between two successive integers. Hence, we conclude that qi is 
not reachable. 

Notice that we have omitted here the test of condition C2. Actually, this test 
can also be decided as a half-linear satisfiability problem. 
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6 Implementation and Experiments 

We have implemented a package of Constrained PDBM’s containing all the ba- 
sic manipulation operations. Our current implementation uses the tool Red- 
log/Reduce [DS97,DSW98] for quantification elimination and deciding satis- 
fiability in RFO, and uses the tool Omega [BGP97] for deciding satisfiability 
in Presburger arithmetics. In the nonparametric case, our package behaves as 
a DBM package: all operations involving only constants are done without case 
splitting (used for comparing parametric terms) and without invoking Redlog 
or Omega). 

Based on this package, we have implemented a procedure for reachability 
analysis using extrapolation. This procedure computes, when it terminates, a 
symbolic graph of a given PTCS starting from a given symbolic configuration. 
We have applied our procedure to several examples including counter and timed 
systems that generate linear constraints such as: 

— the Bakery algorithm for mutual exclusion with unbounded ticket counters 
(0.82 sec with Omega as constraint solver) and two processes, 

— the timed parametric Fisher’s mutual exclusion protocol (169.64 sec.. Omega) 
with two processes, 

— the parametric lift controler of [Val89] where the number of floors is a para- 
meter (286.32 sec. Omega), 

as well as complex systems generating nonlinear constraints relating clocks and 
counters. Indeed, we have applied our techniques to analyse the Bounded Re- 
transmission Protocol which involves a nontrivial parametric reasoning on both 
counters and clocks (~ 91 mn, Redlog/Reduce). We have considered for this ex- 
ample the modelisation given in [DKRT97]. Our reachability analysis procedure 
has been able to construct a symbolic graph corresponding to the partition of the 
set of reachable configurations according to control states. This symbolic graph 
represents a finite abstraction of the original infinite-state model. After projec- 
tion on external actions and minimisation (using the CADP toolbox [FGK+96]), 
we got a finite model with 7 states on which the safety properties of the BRP 
has been automatically checked. 

The analysis of linear systems such as the Bakery algorithm and the lift con- 
troler has been already done by other researchers using different techniques such 
as widening [BGP97,BGL98] or the computation of meta-transitions [BW94]. 
Our experiments show that our techniques are powerful enough to deal with all 
these cases as well as, and in a uniform way, with systems generating nonlinear 
constraints that are beyond the scope of the existing methods and tools. In all 
the cases we considered, our procedure was able to compute the exact set of 
reachable configurations. 

7 Conclusion 

We have introduced an extrapolation principle for analysing systems with coun- 
ters and clocks based on the use of Parametric DBM’s. Our approach is an 
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extension of the existing methods on timed automata to the case of systems 
with parametric constraints. An interesting feature of our techniques is that 
they can be applied uniformly to nonparametric or to parametric systems, to 
linear systems or to nonlinear ones (which are beyond the scope of the known 
techniques and tools). Moreover, our techniques are accurate and generate exact 
reachability sets for a wide class of systems. 

We have implemented our techniques in a tool prototype using Redlog/Reduce 
and Omega as constraint solvers. The experiments we have done with our proto- 
type show that our approach is powerful and effective. In particular, we have been 
able to verify automatically a parametric timed version of the Bounded Retrans- 
mission Protocol. Future work includes studying other symbolic representations 
and associated extrapolation techniques, and identifying classes of arithmetical 
constraints that can be handled efficiently. In particular, it would be interesting 
to investigate parametric extensions of structures like CDD’s [LPWY99] and 
DDD’s [MLAH99]. 

Finally, let us mention that in this paper we have addressed only forward 
reachability analysis. Actually, the techniques we have developed can also be 
used for backward analysis as well. 
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Abstract. We present an algorithm that constructs a finite state “ab- 
stract” program from a given, possibly infinite state, “concrete” program 
by means of a syntactic program transformation. Starting with an initial 
set of predicates from a specification, the algorithm iteratively computes 
the predicates required for the abstraction relative to that specification. 
These predicates are represented by boolean variables in the abstract 
program. We show that the method is sound, in that the abstract pro- 
gram is always guaranteed to simulate the original. We also show that the 
method is complete, in that, if the concrete program has a finite abstrac- 
tion with respect to simulation (bisimulation) equivalence, the algorithm 
can produce a finite simulation-equivalent (bisimulation-equivalent) ab- 
stract program. Syntactic abstraction has two key advantages: it can be 
applied to infinite state programs or programs with large data paths, and 
it permits the effective application of other reduction methods for model 
checking. We show that our method generalizes several known algorithms 
for analyzing syntactically restricted, data-insensitive programs. 



1 Introduction 

Model Checking [CE81,QS82] is a fully automatic method for checking that a 
finite state program satisfies a propositional temporal specification. It has pro- 
ved to be quite useful for the analysis of concurrent hardware and software 
systems; there are several academic and commercial model checking tools. The 
main obstacle to the wider application of model checking is the exponential gro- 
wth in the size of the state space with increasing program size: current tools are 
typically limited to programs with a few hundred boolean state variables. There 
are two main approaches to ameliorating this state-explosion problem: composi- 
tional verification, where one manually constructs a proof outline that exploits 
the compositional structure of the program, while model-checking sufficiently 
small components, and abstraction, where one constructs a smaller abstract pro- 
gram in a manner which ensures that the specification holds for the original 
program if it holds for the abstract program. 

Abstraction is often carried out manually and justified only informally. For 
large programs, such manual abstraction is error-prone and often infeasible. Our 
focus in this paper, therefore, is on automating the abstraction process. We 
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present an algorithm that constructs a finite state “abstract” program from a 
given, possibly infinite state, “concrete” program by means of a syntactic pro- 
gram transformation. The abstract program represents predicates of the concrete 
program by boolean variables. Starting with the atomic predicates from a speci- 
fication or statement of a property that is to be verified, the algorithm iteratively 
computes the predicates required for the abstraction relative to that specifica- 
tion, as well as the necessary updates to the corresponding boolean variables. 
This is achieved by a syntactic analysis that does not construct the explicit tran- 
sition graph either of the original or of the abstract program, each of which may 
be too large to compute. 

We show that the algorithm is sound, in that the abstract program is always 
guaranteed to be a conservative approximation of (i.e., simulates) the original, 
with respect to the set of specification predicates, AP. Under certain conditions, 
the algorithm produces an exact (i.e., bisimular) abstraction of the original pro- 
gram. The soundness result implies that a temporal property in ACTL* over AP 
holds for the original program if it holds for the abstraction. For an exact ab- 
straction, this is true for properties written in the full /x— calculus. We also show 
that the algorithm is complete in the sense that, if the state transition graph 
of the original program has a finite simulation (bisimulation) quotient, then the 
algorithm can produce a finite simulation-equivalent (bisimulation-equivalent) 
abstract program. 

Syntactic abstraction has several advantages: 

— The algorithm produces an implicit (syntactic) description of the abstract 
program. Hence, other methods for model checking, such as symbolic (BDD- 
based) model checking and partial-order reduction, can be applied to the 
abstract program. 

— It supports the abstraction of data-insensitive programs with large or infinite 
data paths. Our method generalizes several earlier algorithms for analyzing 
data-insensitive programs and may also be used in other cases, such as sym- 
metric programs, where large reductions can be achieved through bisimula- 
tion minimization. 

— It can often be substantially more efficient than the symbolic minimization 
algorithms of [BFH90,LY92,HHK95] as, unlike these methods, our algorithm 
does not construct the explicit transition graph of the minimized system, 
which could be quite large. 

— For programs with bounded non-determinacy, our algorithm is able to con- 
struct abstractions without the manual application of theorem-provers, as 
proposed for other predicate abstraction methods (cf. [GS97,BL098]). 

The paper is structured as follows. In Section 2, we provide some background 
on simulation, bisimulation, temporal logic and model checking. We describe the 
basic algorithm in Section 3 and prove the soundness and completeness claims. In 
Section 4, we present some useful extensions to the basic algorithm. In Section 5, 
we describe several applications of the algorithm. Section 6 concludes the paper 
with a discussion of related work. 




Syntactic Program Transformations for Automatic Abstraction 437 



2 Background 

We provide some background on abstraction and model checking, and present a 
simple program syntax on which the analyses of the algorithm are defined. 

2.1 Labeled Transition Systems 

The state transition graph of a program is represented by a Transition System 
(TS, for short) [Kel76] which is a tuple {S, A, I, AP, L), where 

• S' is the set of states, 

• Z\ C S X S, is the (left-total) transition relation. We write s — >■ t instead of 
(s, t) £ A for clarity. 

• / C S is the set of initial states, 

• AP is the set of atomic propositions, and 

• L : S — >■ 2^^ is the state labeling function, which maps each state to the set 
of atomic propositions that hold at that state. 

A computation a of the TS is an infinite sequence of states such that (Tq G I, 
and for each i, Ui — > CTj+i. A TS is often constrained by a fairness condition, 
expressed as a boolean combination of basic fairness conditions “infinitely often 
p” , where p is a set of state pairs ^ . A /air computation of the TS is a computation 
that satisfies the fairness condition, where the basic fairness formula above is 
satisfied iff transitions from p appear infinitely often on the sequence. 

2.2 The /x-Calculus 

The p-calculus [Koz83] is a branching-time temporal logic, where formulas are 
built from atomic propositions, boolean connectives, the least-fixpoint operator, 
and the modality ()/> (“there is a successor satisfying /)”). For a state s in a 
TS M and a p-calculus formula (j) over the atomic propositions of M, we write 
M, s ^ to mean that “/> is true at state s in model M” . The sub-logic Ap, 
consists of those p-calculus formulae where every () operator is under an odd 
number of negations. It is possible [EL86] to encode a number of temporal logics, 
including CTL, CTL* [CES86,EH86] and LTL [Pnu77] into the p-calculus. 

2.3 Simulation and Bisimulation 

The correctness of any abstraction method is determined by the nature of the 
relationship between the concrete and the abstract program. A typical relati- 
onship is that the abstract program is able to match every computation of the 
concrete program. This is formalized in the definitions below. 

Definition 0 (Simulation Relation) [Mil71] A relation R C S x S is a simu- 
lation relation on a TS M = (S, A, I, AP, L) iff for any (s,t) G R, L{s) = L{t) 
and for any u such that s — > u, there exists v such that t — > v and {u, v) G R. 

^ The usual unconditional, weak and strong fairness conditions can be written as 
boolean combinations of basic fairness conditions. 
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Definition 1 (Bisimulation Relation) [ParSl] A relation R is a bisimulation 
on TS M iff R is symmetric and a simulation relation on M. 

State t simulates a state s iff (s,t) is in the greatest simulation, which exists 
as simulations are closed under arbitrary union. States s and t are simulation- 
equivalent (written as s ~ t) iff they simulate each other. States s and t are 
bisimular in M (written as s « t) iff (s,t) is contained in the greatest bisimula- 
tion relation on M. The connection between model checking and these notions 
of program equivalence is as follows. 

Theorem 0 For a TS M and states s, t in M, 

1. (cf.[GL94]) For any formula f on AP, if t simulates s then Mff |= / 
implies M, s \= f. 

2. (cf. [BCG88]) For any pL-calculus formula f on AP, if s t then M,s |= / 
iffMff^f. □ 

As ~ ( « ) is an equivalence relation, it induces a quotient TS whose states 
are the equivalence classes of M w.r.t. ~ («), the initial set of states is the 
equivalence classes of the initial states of M, and there is a transition (C, a, D) 
iff (3s, t:sGCAtGD:s t). 



2.4 Program Syntax 

Our algorithms transform one program text to another. For our purposes, instead 
of specifying a particular program syntax, it suffices to consider a program as 
being defined on a set of variables, and being specified once an initial condition, 
a transition relation and a fairness constraint are defined. These are specified 
syntactically as predicates: a predicate is a quantifier-free formula of a first- 
order logic. The relation symbols form the atomic predicates. Some relation and 
function symbols may have fixed interpretations, for instance =, <, -I-. 

Definition 2 (Program) A program is specified by a tuple {X, I,T, F), where 

— X is a finite, non-empty set o/ variables. Each variable x has an associated 
domain of values, dom{x). 

— I{X) is the initial condition, specified as a predicate on X. 

— T{X, X') is the transition relation, specified as a predicate on X U A', where 
X' is a set of “next-state” variables that is in 1-1 correspondence with X . 

— F{X, X') is the fairness condition, specified as a boolean combination of the 
basic fairness condition “infinitely -often p”, for a predicate p over X\J X' . 

The semantics of such a program is given by a TS defined as follows. The 
state space is the Gartesian product of the domains of the variables. The value of 
an expression e in state s is denoted by e(s); this can be defined by induction on 
the expression syntax. An initial state is one for which the initial state predicate 
evaluates to true. The transition relation is defined as follows: there is a transition 
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s — t iff T{s,t) is true. The state labeling function L is defined by: for each 
atomic predicate P, P G L{s) iff P{s) = true. The fairness condition of the fair 
TS is obtained in a straightforward manner from the condition F. 

We develop our algorithm under the assumption that the set of predicates 
is effectively closed under the application of the “weakest liberal precondition” 
transformer [Dij75], denoted by wlp{P) (in terms of the ^-calculus, wlp{P) = 

Typical programming constructs can be rewritten into the program syntax 
presented above. For example, a guarded command [Dij75], which has the form 
g{X) ^ U := e{X), defines the transition relation (g{X) A {/\i : Xi G U : 
x' = ei{X)) A {/\i : Xi G X\U : x' = Xj). For guarded commands, wlp{P) 
can be calculated by simple substitution as {g{X) P[U G- e{X)]), where 
P[U G- e{X)] is the predicate obtained by replacing each occurrence oi Xi G U 
by ei in P, for all i. Programs with external, non-deterministic inputs can be 
defined by partitioning the set of variables X into the input variables Y, which 
are unconstrained, and state variables Z, whose next-state values are constrained 
by the transition relation. Using the syntax [U] : g{X) ^ U := e(X) for 
describing such transitions, wlp{P) is given by (VU :: g{X) P[U G- e{X)\). 
In this case, it may be necessary to use a quantifier-elimination procedure, such 
as that for Presburger arithmetic, to re-write wlp{P) as a predicate. 



3 The Abstraction Algorithm 

We motivate and describe our algorithm for the simpler case of programs that 
exhibit hounded non-determinism. For such programs, the transition relation 
T{X,X') is equivalent to a bounded disjunction (V * :: Ti{X,X')), where each 
Ti is deterministic, i.e., for any state s, there is at most one state t such that 
Ti{s,t) holds. We assume that the transition relation is given in this form and 
refer to each Ti as an action. The key property that we exploit is that for a 
deterministic action a, wlp^^ distributes over all boolean operators 

We do not consider fairness conditions in this section. The extensions needed 
to handle fairness and unbounded nondeterminism are described in Section 4. 
The results established here carry over, essentially unchanged, to the general 
case. 



3.1 A Motivating Example: the 2-Process Bakery Protocol 

Consider the 2-process “Bakery” mutual exclusion protocol [Lam74] presented 
in Figure 1 as a finite collection of guarded commands. The specification of 
mutual exclusion in CTL is AG(-'(sti = C A st 2 = C)). To verify this property, 
an abstraction of the protocol needs to preserve at least the atomic predicates 
sti = C and st 2 = C, as well as those in the initial condition: sti = N, st 2 = N, 
Ui = 0, ?/2 = 0. We may choose to retain the variables sti,st 2 , as they have 

In the case of negation, wlp^{-^P) = wlp^(false) V -iwlp^{P) 



2 
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var sti, st2 : {N, W, C} 

(* N=“Non-criticar’, VF= “Waiting” , C= “Critical” *) 




var yi,p2 ■ natural 

initially (sti — N) /\ {yi = 0 ) A (st2 = N) A {y2 = 0 ) 




action waiti 


sti = A sti,yi := W, j/2 + 1 




action enteri 


sti = W A (2/2 = 0 V 2/1 < 2/2) 


sti := C 


action releasei 


sti = C ^ sti, 2/1 := A , 0 




action wait2 


st2 = A ^ st2,y2 ■■= W, 2/1 + 1 




action entev2 


st2 = W A (2/1 = 0 V 2/2 < 2/1) ^ 


st 2 := C 


action releas€2 


st2 = C ^ sti, 2/2 := A , 0 





Fig. 1. The 2-process Bakery mutual exclusion algorithm. 



small finite domains, but we would want to abstract yi,y2, as they have infinite 
domains, retaining only those predicates on j/i, y2 which are necessary to preserve 
the control flow of the protocol. We do so by introducing auxiliary boolean 
variables b\ and &2 which represent j/i = 0 and y 2 = 0, respectively. The initial 
condition can now be expressed as (sfi = N) f\ b\ f\ {st2 = N) f\ b2- 

To preserve the correspondence between h\ and y\ = 0, we need to compute 
an update to b\ for each action. For a deterministic action o, b\ is true after an 
update to yi exactly if = 0) is true before the update. For the action 

waiti, the syntactic wlp calculation yields {{sti = N) (j /2 -I- 1 = 0)). Now, 
however, we have a new predicate (j /2 + 1 = 0), which can be simplified to false, 
as j /2 is a natural number. Hence, the modified action is: 

action ruazti sti = N =>• sti,yi,bi := W,y2 + I, {sti = N ^ false) 

We may simplify this further by replacing occurrences of the guard expression 
by true in the assignment, to get: 

action wazti sti = N <=>• sti,yi,bi := W,y2 + l,false 
The result of (j/i = 0) is ((sti = 1 + A ( 1/2 = 0 V yi < j/ 2 )) ^ (yi = 

0)). In this case, we have a new atomic predicate, y\ < y2, which must therefore 
be tracked by a new boolean variable, 63. Repeating the steps above, we get: 
action enteri sti = W A (&2 V 63) sti, 61 := C,bi 
This iterative process of computing weakest preconditions and collecting new 
predicates terminates for our example program; i.e., after a finite number of ite- 
rations, no new predicates are generated. Hence, it suffices to consider only these 
predicates to verify the property. As the auxiliary boolean variables track the 
predicates exactly, the infinite-domain variables yi and j /2 are unnecessary and 
can be removed, resulting in the finite-state abstraction shown in Figure 2, which 
is bisimulation- equivalent to the original with respect to the initial predicate set 
{sti = C,st2 = C}. From Theorem 0, the mutual exclusion property is true of 
the original program if and only if it is true of the abstraction. 
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var sti, st2 : {N, W, C} 

(* Af=“Non-critical”, “Waiting” , C= “Critical” *) 
var 6 i, &2, &3 : boolean 

(* bi = {yi = 0), 62 = (2/2 = 0), 63 = (j/i < 2/2) *) 
initially {sti = N) A bi A {st^ = A") A 62 A 63 



action waiti 
action enteri 
action releasei 



sti = N ^ sti, 61, 62, &3 ;= W, false, b2, false 

sti = W A (62 V 63) ^ sti, 61, 62, 63 := C,6i,&2,&3 

sti = C ^ sti, &i, 62, 63 := N, true, b2, true 



action wait2 
action enter2 
action release2 



st2 = N ^ st2,bi,b2,b3 ;= W,bi, false, true 

St2 = W A (61 V —lbs) ^ St 2 ,bl,& 2,&3 := C,&i, 62,63 
st2 = C ^ sti, 61, 62, 63 := N,bi,true,bi 



Fig. 2 . Abstraction of the 2-process Bakery mutual exclusion algorithm. 



3.2 The Algorithm 

The Bakery example introduced the key ingredients of our algorithm: starting 
from the atomic predicates in the specification formula, predicates of the ori- 
ginal program are represented by boolean variables in the abstraction, exact 
updates for these boolean variables are computed by a syntactic wlp computa- 
tion, possibly introducing new predicates to be examined, and simplifications are 
performed to avoid introducing new predicates that are syntactically distinct but 
semantically identical to predicates generated earlier. The algorithm is presented 
in its entirety in Figure 3. 

The algorithm maintains a correspondence table, C, which relates syntactic 
atomic predicates to corresponding boolean variables. The algorithm also main- 
tains two sets of atomic predicates: oldPred, which consists of those predicates 
for which wlp has been calculated, and newPred, which consists of the unex- 
amined predicates. Initially (step 1), oldPred is empty, and newPred contains 
the atomic predicates of the specification formula, the initial condition, and the 
actions. In each iteration (steps 2a-2c), wlp is calculated for each predicate in 
newPred and the result is massaged (described below) to extract new predicates, 
for which new boolean variables are introduced. In steps 3, the abstract transi- 
tion relation is defined by updating each boolean variable with the expression 
formed by massaging wlp for the corresponding predicate. 

Although the process of generating new predicates terminates for the Bakery 
example, in general, it may not terminate. To ensure termination, we iterate 
this process for K steps, where A is a parameter to the algorithm. After K 
iterations, however, the predicates in newPred have not been processed by a 
wlp computation. Boolean input variables are introduced for these predicates, 
which are constrained by to valuations that are consistent relative to their 
corresponding predicates. Similarly, constrains the initial values of boolean 
variables corresponding to the predicates in oldPred. In the definition of T 4 , the 
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wlp operator is applied only to predicates from oldPred, so the result is in terms 
of oldPred U newPred and gets massaged into an expression on 

It is not necessary to add and 'P to obtain a program that is a conservative 
approximation; these are needed only to establish the completeness results. If 
the atomic predicates come from a class with quantifier elimination (such as 
Presburger arithmetic), and P can be computed syntactically. In general, the 
computability of P and P is equivalent to the decidability of satisfiability for 
predicates - in the worst case, P> (similarly, P) can be computed by checking the 
predicate in the scope of the (3Jf) quantifier for satisfiability for each valuation 
of the boolean free variables. The decidability of this satisfiability question is 
also assumed for the symbolic minimization algorithms [BFH90,LY92,HHK95]. 

The massaging step {massage : (e, C) i— >■ {e,newC,fP)) simplifies the ex- 
pression e and replaces atomic predicates by corresponding boolean variables, 
defining new boolean variables for predicates not already in C. These new pre- 
dicates are collected in fP, and C is updated to newC by adding the new 
predicate- variable correspondences. The resulting expression is denoted by e. 
We also use this notation to let S represent the set of boolean variables corre- 
sponding to the atomic predicates in set S. The simplifications accelerate the 
convergence of the algorithm and are thus necessary in practice, but not in 
theory, as shown in Theorems 3 and 4. Examples of simplification rules are: 
((if c then e else f) < g) = if c then (e < g) else (/ < g), {true A x) = true, 
{x = x) = true, {x + y) < {x + z) = {y < z), (if c then e else c) = (c A e). 
For example, if the expression e is {\j_x = u then y else x) = u and the current 
correspondence table is {{x = u, 6)}, the massaging step produces the new table 
{{x = u,b), {y = u, c)} and the massaged expression e = (6 A c). 

There are several interesting claims that can be made about the algorithm, 
despite its simplicity. The algorithm is sound, in that the abstract program is 
guaranteed to simulate the original, with respect to the initial set of atomic 
predicates, AP. A more interesting fact is that the algorithm is also complete, 
in that, if the TS of the original program has a finite simulation (bisimulation) 
quotient, then iterating the main loop (step 2) of the algorithm sufficiently many 
times (i.e., with a large enough value for K) results in an abstract program whose 
TS is simulation-equivalent (bisimulation-equivalent) to the original with respect 
to AP. These propositions are stated precisely below. Due to space limitations, 
we present only a sketch of the proof. We use the following notation: for an 
abstract state t (which is always defined over oldPred), ft (read as “up f’) 
denotes the set of concrete states that agree with t on the valuations of the atomic 
predicates in oldPred; precisely, ft = {s|(VP : P G oldPred : P{s) = P{t))}. Let 
A be the TS of the abstract program, and C the TS of the concrete program. 

Lemma 0 (Invariance Lemma) For every state t of A, ft is a non-empty set 
of concrete states. 

Proof Sketch. The formula P ensures this for initial states, while P and the 
wlp computation ensure that the invariant holds. □ 
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1. The initial set of atomic predicates consists of those in the specification formula, 
the initial condition of the program, and the transition relation. This forms the 
set of unexamined predicates, newPred. The set oldPred, of already examined 
predicates is initially empty. The initial correspondence table C is empty. 

2. While newPred is non-empty, and the iteration bound K is not reached, 

a) Initialize freshPred to the empty set 

b) For each predicate P in newPred and each action a, compute 
ie,C,fP) ~ massa(;e(uifp,j(P), C), and add the predicates 
in fP to freshPred. 

c) Compute newPred, oldPred := freshPred, oldPred U newPred 

3. The abstract program {X , I ,T ,F ) is formed as follows. 

— X — bUc, where b is the set of boolean variables corresponding to oldPred, 
and c is the set of boolean variables corresponding to newPred. 

— I (b) is defined as massage{I,C) A F, where 

<P = (3X :: I{X) A {/\i : Pi G oldPred : h = Pi{X))). 

— T (be, & c ) is defined as 

{\J a :: A {/\i : Pi G oldPred : bi = massage(wlp^(Pi),C))), where ^ = 

(3A ■.■. (!\i : Qi G newPred : a = Qi(X))). 

— As we are not considering fairness, both F and F are true. 



Fig. 3. The abstraction algorithm 



Theorem 1 (Simulation Theorem) The finite state abstract program simu- 
lates the concrete program w.r.t. the set of predicates AP. 

Proof Sketch. The claim is proved by showing that the relation R defined by 
(s,t) € i? iff s Gft is a simulation relation from C to A. As AP C oldPred, this 
relation preserves the values of the predicates in AP. □ 

Theorem 2 (Bisimulation Theorem) If newPred = % on termination of 
step 2 of the abstraction algorithm, then the finite state abstract program is 
bisimulation- equivalent to the concrete program w.r.t. the set of predicates AP. 

Proof Sketch. If newPred = 0 upon termination then, for each atomic pre- 
dicate P in oldPred and each action a of the concrete program, wlp^(P) is a 
predicate over oldPred. As each action a is deterministic, for each predicate P, 
the transition P = massage(wlp^{P),C) (step 3) exactly captures the change 
to P after execution of action a. Let B be the relation on the disjoint union 
of the abstract and concrete programs defined by (s, t) G B iff s Gft V t efs- 
The claim is proved by showing that B is a, bisimulation, under the condition 
newPred = 0. □ 

Theorem 3 (Bisimulation Completeness) If the concrete program has a 
finite reachable bisimulation quotient, there is an appropriate choice for the ite- 
ration bound K such that the abstract program produced by the algorithm is 
bisimulation- equivalent to the concrete program w.r.t. AP. 
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Proof Sketch. Suppose that the concrete program has a finite reachable bisi- 
mulation quotient. By the results in [BFH90,LY92], the states of this quotient 
can be calculated as a finite partition 77 by a symbolic partition refinement al- 
gorithm. The classes of 77 are defined by boolean combinations of a finite set of 
atomic predicates, V, that includes AP. The minimization algorithm computes 
the formulae defining these classes by repeated wlp computations. As wlp dis- 
tributes over all boolean operators, these wlp computations may be rewritten to 
apply wlp only to atomic predicates, which is what our algorithm does. Since 
the unexamined predicate set newPred is obtained through wlp computations in 
a breadth-first manner, the algorithm eventually generates every predicate ne- 
cessary for describing the classes of 77. Let K be the least iteration after which 
V C oldPred holds. 

While the classes of the quotient can be described using a finite number 
of atomic predicates, our algorithm considers each predicate individually, not 
as part of a class formula. Hence, it is possible for our algorithm to generate 
new predicates (even semantically new predicates) beyond the 77th iteration. If 
newPred is empty after K iterations, the claim follows by Theorem 2. Otherwise, 
define the relation R by (s, t) G 7? iff the concrete state s and the set of concrete 
states are both included in the same class of 77. As the extra predicates 
in oldPred\P are unnecessary, the relation B defined by sBt iff {sRt V tRs) 
can be shown to be a bisimulation between C and A. As AP C oldPred, this 
bisimulation preserves the predicates in AP. □ 

Theorem 4 (Simulation Completeness) If the concrete program has a finite 
simulation quotient, there is an appropriate choice for the iteration hound K such 
that the abstract program produced by the algorithm is simulation- equivalent to 
the concrete program w.r.t. AP. 

Proof Sketch. There is a partition refinement algorithm [HHK95] to compute 
the greatest simulation relation that also employs wlp computations in a manner 
similar to the bisimulation minimization algorithms. The claim then follows from 
arguments similar to those in the proof of Theorem 3. □ 

4 Extensions to the Algorithm 

4.1 Retaining a Set of Finite-Domain Control Variables 

The algorithm can be easily modified to retain a set of finite-domain control va- 
riables V while abstracting out the rest (A\V). We assume that V is closed un- 
der next-state dependencies; formally that, for any predicate P{V), wlp^{P{V)) 
does not introduce any atomic predicates over A\V other than those already 
present in the action a. During the massaging process, atomic predicates over V 
are not replaced with corresponding boolean variables. After step 2 terminates, 
the concrete program transitions for the V-variables are massaged and copied 
over to the abstract program. This modification was used in the Bakery example 
to retain the variables .sti , st 2 . It is particularly useful when data variables are 
to be abstracted while retaining control variables. 
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4.2 Handling Fairness 

There are two ways in which fairness may be specified. In the first type, one 
may specify fairness constraints on the actions of the program. Since the ab- 
stract program is strongly similar (bisimular) to the original, Afi (/i) properties 
over fair computations that hold in the abstract program also hold of the origi- 
nal. If fairness is specified instead by constraints on program states, we can add 
the atomic predicates from these constraints to newPred in step I of the algo- 
rithm and, in step 3, massage the fairness conditions to get the corresponding 
conditions for the abstract program. 



4.3 Abstracting Programs with Unbounded Nondeterminacy 

We presented our algorithm for programs with bounded non-determinacy, which 
was exploited by considering each deterministic action individually. For transi- 
tion relations that exhibit unbounded nondeterminacy, this partitioning is not 
possible, so we adopt a slightly different strategy. The key idea is to replace the 
computation (in step 2) of wlp for individual predicates with a computation of 
wlp for all clauses formed out of oldPred U newPred (a clause is disjunction of li- 
terals, where each literal is an atomic predicate or its negation). This algorithm 
is both sound and complete, in the sense used earlier; however, it is probably 
impractical as an exponential number of clauses is generated in each iteration 
of step 2. We present below a simpler algorithm which is sound but not com- 
plete; however, as shown in the next section, it generalizes existing algorithms 
for data-insensitive programs. 

Consider a partitioning of the transition relation into actions of the form [W] : 
g(V,W) ^ V := e(V,W), where IF is a set of unbounded input variables, 
and F is a set of state variables. Hence, wlpg^{P{V)) is (VW :: wlp'^(P)), where 
wlp^{P) is g{V,W) P(W,e(y,W)). The expression wlp^{P) may contain 
two types of predicates: state predicates over V and “mixed” predicates over 
V U W. Step 2 of the original algorithm is replaced with the following. 

1. Compute wlp^{P) repeatedly with only the state predicates in newPred until 
no new state predicates are generated or the iteration bound K is reached. 

2. If the iteration bound K is reached, proceed as before. Otherwise, define a 
boolean input variable Ci for each mixed predicate Pi in newPred U oldPred, 
and compute F = (3IF :: (/\- :: Ci = Pi{W, F))), which is the condition for a 
c- valuation to be consistent. Quantifier-elimination results in a definition of 
F as a predicate on FU{ci}. If there are new state predicates in F, add those 
to newPred and return to the first step, otherwise conjoin massage{F,C) to 
the abstract transition relation. 



Theorem 5 (i) The modified algorithm always produces a conservative abstrac- 
tion. (a) If the algorithm terminates at step H with newPred = 0, the abstract 
program is bisimular to the concrete program w.r.t. AP.U\ 
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5 Applications 

From the completeness results, our method can be applied to any program that 
has a finite quotient relative to the atomic predicates in the specification. Specific 
types of programs are particularly amenable to the application of this algorithm. 



5.1 Data-Insensitive Programs 

We define data-insensitive programs as those that have a finite simulation quo- 
tient that preserves the values of all control variables. Hence, the control-flow is 
dependent on only a finite number of data predicates. Several papers describe 
restrictions on program syntax to ensure data-insensitivity, and provide abstrac- 
tion algorithms which replace data domains by small finite domains, keeping 
the program actions unchanged [Wol86,HB95,ID96,Laz99]. This is justified by 
showing a bisimulation between the large- and small-domain instances. 

Our program transformation method terminates for each of the above classes 
of programs. For instance, in [ID96], the only atomic predicate is =, and every 
assignment has the form X := F. As there are distinct atomic equality 
predicates over the n variables Xi G X, our algorithm terminates in at most 
steps, creating a bisimulation-equivalent abstraction by Theorem 5. The trans- 
formation of [ID96] replaces each data domain with [0 . . . n], hence the abstract 
state is represented by n * log(n) bits. In contrast, if only a few combinations of 
equality predicates are necessary, our method will result in states representable 
with fewer than n * log(n) bits One may also show the following theorem. 

Theorem 6 For programs without unbounded inputs where assignments are of 
the form X := Y , our abstraction algorithm terminates with a bisimulation- 
equivalent abstract program. 

Our algorithm also terminates for showing that the program below has an 
infinite computation, which is not possible to show with a finite domain method 
(cf. [HB95]). Thus, our algorithm is strictly more powerful than the finite-domain 
methods. 

var X : natural 
initially a; = 0 

action a\i : natural] (x < i) ^ x := i 

5.2 Symmetric Programs 

Bisimulation reductions for semantically symmetric programs have been pro- 
posed in [ES93,CFJ93]. It is computationally difficult, however, to implement 
such reductions symbolically (i.e., with BDD’s) [CFJ93]. Hence, [ET99] consider 
syntactically symmetric programs, defined using symmetric predicates such as 

® A similar tradeoff occurs in finite-domain [PRSS99] vs. predicate abstraction 
[SGZ"'"98] approaches to verifying combinational circuits over integer variables. 
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(Vz :: P{i))- The reduction of such a program with n processes, each with k 
local states, is obtained by introducing k variables {xi}, each with a domain of 
[0 . . . n]. By considering symmetric predicates to be atomic, our algorithm can 
be applied to such programs. In the worst case, it may produce all predicates of 
the form Xi > I for each I G [0 . . . n], requiring k * (n + 1) (correlated) boolean 
variables, but with a poly(n) size BDD for each action. On the other hand, as our 
algorithm calculates only those predicates necessary for the reduction, it may 
also produce a program with fewer than the k * log(n) bits required by [ET99]. 

6 Related Work and Conclusions 

Among related work, [GS97,CABN97,BL098,CU98] also propose predicate ab- 
straction methods. [CABN97] performs a simple syntactic transformation, but 
requires the use of a constraint solver during the model checking process. The 
methods of [GS97,BL098,CU98] utilize general-purpose theorem proving to com- 
pute the abstract program, which is defined over boolean variables that corre- 
spond to an a priori fixed set of predicates. If the verification fails on the abstract 
program, the set of predicates is refined heuristically, and the abstraction pro- 
cess is repeated. In contrast, our algorithm, which has the same initial choice of 
predicates, both refines the set of predicates and computes the abstract program 
automatically, in a manner that is shown to be both sound and complete. 

The papers [Wol86,ID96,HB95,Laz99] present algorithms for abstracting syn- 
tactically restricted programs. As shown in the previous section, our algorithm 
can be applied with guaranteed termination to these classes of programs, and is 
more generally applicable. 

The symbolic minimization algorithms in [BFH90,LY92,HHK95] produce an 
explicit abstract transition system, which can be quite large and may preclude 
the effective application of symbolic model checking and partial order reduction 
methods. In contrast, our algorithm produces an implicit program description in 
the same syntax as the original program, which can then be analyzed using any 
model checking method. Our completeness results guarantee that our algorithm 
can find an equivalent finite abstract program, if one exists, given an appropriate 
termination bound. 

The deductive model checking algorithm of [SUM99] produces an abstrac- 
tion relative to a LTL specification by a process of iterative refinement which, 
however, requires significant human intervention. In [KPOO], it is shown that 
finite state abstractions exist for programs that satisfy LTL properties; the com- 
pleteness proof is non-constructive in general but partly utilizes predicate ab- 
straction. Other automatic abstraction methods [DGG93,GGL94] apply only to 
finite state systems. Other semi-algorithms [HGD95,KMM+97,BGP97,BDG"''98] 
directly model check infinite state systems without computing an abstract pro- 
gram. The algorithm in [GS92] for model checking a special type of parameterized 
system relies on a trace equivalent abstraction. 

We have hand-simulated our algorithm for a simple data transfer protocol 
that transmits natural numbers with 0 representing null data. The correctness 
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property is that data that is sent is eventually received. The abstracted protocol 
is verified by COSPAN using less resources (time, space) than the verification 
of the original protocol with data domain size 1. We are currently developing a 
prototype implementation to experiment with larger examples. 
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Temporal-Logic Queries 



William Chan* 



Abstract. This paper introduces temporal-logic queries for model understanding 
and model checking. A temporal-logic query is a temporal-logic formula in which 
a placeholder appears exactly once. Given a model, the semantics of a query is a 
proposition that can replace the placeholder to result in a formula that holds in the 
model and is as strong as possible. The author defines a class of CTL queries that 
can be evaluated in linear time, and show how they can be used to help the user 
understand the system behaviors and obtain more feedback in model checking. 



1 Introduction 

Although model checking was proposed as a verification (or falsification) technique [3], 
we find it valuable for model understanding: The user hypothesizes a behavior of the 
system, expresses it as a temporal-logic formula, and attempts to use the model checker 
to validate the hypothesis. The process is iterated while the user gains knowledge about 
the system. In our opinion, this use of model checking has not been emphasized enough 
in the literature. To further help the user understand system behaviors, in this paper we 
introduce temporal-logic queries and use a technique similar to symbolic model checking 
to infer temporal properties as opposed to merely checking them. 

This work was partly motivated by the recent interest in deriving invariants of soft- 
ware for comprehension, documentation, or evolution [6, 10, 12]. Inferring invariants is 
not a new idea, but the traditional objective is to assist in theorem proving [e.g., 2]. We 
believe that inferring properties is particularly useful for software models, which, unlike 
hardware, often lack explicit correctness criteria. 

Let us first consider inferring invariants. Observe that when the reachable state space 
can be computed symbolically, we in fact obtain the strongest invariant of the system. 
Although this invariant contains a tremendous amount of useful information, it is likely 
to be too complex for the user to understand. But note that most interesting invariants in 
practice involve only a small number of atomic propositions. So, one way to extract useful 
information from the reachable states is to project its symbolic representation onto a small 
subset of atomic propositions, thereby deriving a weaker, but more comprehensible, 
invariant. (In fact, in version 2.5 of CMU’s SMV model checker, a user can select a 
subset of atomic propositions on which to project the reachable states.) We have found 
situations in which insightful information can be obtained even if we project the reachable 
states on only singleton sets or pairs of atomic propositions. 
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Inferring invariants is a special case of evaluating temporal-logic queries. A query 
is a formula in which a special symbol ?, called a placeholder, appears exactly once. 
An example CTL query is AG?. The semantics of a query is a proposition p such that 
replacing ? with p in the query results in a formula that holds in a given model and is 
as strong as possible. Therefore, the query AG? evaluates to the strongest p that makes 
AGp hold; in other words, it represents the strongest invariant. More complex examples 
include AG ( req — A ( (? V AG ~'ack)y\l ack) ) , which asks what is true between the receipt 
of a request and the transmission of an acknowledgement, and AF {startup ^complete V 
AG?), which roughly asks, what is eventually always true in the case that the startup 
operation cannot be completed. 

As it turns out, however, not every query is meaningful. For example, in most models, 
the desired proposition defined above cannot be found for the query AF?. Indeed, we 
show that identifying the “valid” queries is intractable (EXPTIME-complete for CTL 
queries), so we resort to a conservative approach: We define a class of CTL queries 
that are guaranteed to be valid, and furthermore can be evaluated in time linear in the 
size of the model and in the length of the query. That is, the asymptotic worst-case 
time complexity is the same as that of CTL model checking. Though syntactically quite 
restricted, the class contains interesting queries such as the two examples given above. 

In addition to deriving temporal properties based on a given pattern, the technique can 
also be used to provide more feedback to the user in model checking, such as providing 
a partial explanation when the property checked holds, and diagnostic information when 
it does not. Suppose we would like to check the invariant AG {xVy). We can evaluate the 
query AG?. Assume that after projecting the strongest invariant on x and y, we obtain 
AG{x A y) as an inferred formula. Note that this formula is stronger than the one we 
wanted to check. Not only can we conclude that AG(a: V y) holds, we can also inform 
the user of the stronger property. This information can either serve as an explanation 
of the verification result (x V y is an invariant because x A y is), or as an indication 
that the checked property is in a sense vacuously true. Furthermore, in case a formula 
is falsified, apart from obtaining a counterexample, the user can pose queries to acquire 
morefeedback. For example, if AG(reg AFac^) is false, that is, a request is not always 

followed by an acknowledgement, we can ask what can guarantee an acknowledgement, 
AG(? AF ack). 

The rest of the paper is organized as follows. We review CTL model checking in 
Section 2, Section 3 gives the main technical results, while Section 4 explains how to 
simplify a proposition so the user can understand it. Section 5 describes some initial 
experience of the technique. We conclude in Section 6 with some discussion of future 
work. 



2 Background 

This section gives an overview of CTL model checking [3]. Note that, to facilitate our 
definition of CTL queries, our formulation of CTL is non-standard. 

A model is a tuple {Q, Qg, A, X, L), where Q is a finite set of states, Qo C Q 
is the set of initial states, AC Q x Q is the transition relation, A is a finite set of 
atomic propositions, and the function L: Q ^ maps each state to a set of atomic 
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propositions that are true at the state. The transition relation A is assumed to be total; 
that is, for every q G Q, there exists a successor q' G Q with {q, q') G A. A path is an 
infinite sequence of states in which each consecutive pair of states belongs to A. A state 
is reachable if it appears on some path starting from some initial state. 

Properties about a model can be specified in the Computation Tree Logic (CTL). 
CTL formulas consists of atomic propositions, Boolean operators, path quantifiers, and 
temporal operators. Formally, 

- any atomic proposition and true are CTL formulas, and 

- if (f> and are CTL formulas, then (j)\/ tp, AXp, A{(f>\N ip), and A{(p\J ip) are 

also CTL formulas. 

As usual, A is the universal path quantifier, X is the next-time operator, and U is the 
strong until operator. We call the operator W the overlapping weak until operator, which 
is like the dual of U, that is, (pV\l ip = -'{-'ip U -"p)P Intuitively, X<p means that <p is 
true in the next state, (pUip means that <p remains true until ip becomes true, and <p\N ip 
means that either (p is true forever, or (p remains true until both (p and ip become true. A 
formula is propositional (or, simply, abusing terminology, a proposition) if it does not 
contain temporal operators X, W, or, U. 

Assume a fixed model M. We write q\= p if the CTL formula p is true at state q. 
The truth value of a CTL formula at a state qo is then defined as follows (x is any atomic 
proposition, and ^ and f/j are CTL formulas): 1. qo\=true. 2. qo \= x iff x G L{qo) . 

3 . go 1= ~'P iff it is not the case that qo \= p. 4. qo \= p\/ p iff either qo |= ^ or 
qo 1= p. 5. qo \= AXp iff qi \= p for every qi with {qo, qi) G A. 6. go h A U p) 

iff for every path go, gi, g 2 , ■ • ■ , there exists an i > 0 with g^ \= p, and g^ \= p for all 
j < i. 7. go ^ A W p) iff for every path go, gi, g 2 , . . . , for all i > 0, if qj ^ -'p 
for all j < i, then g^ \= p. 

We write S \= p if q \= p for each q G S. We say that M satisfies p, written M \= p, 
if p is true at every initial state of M. The CTL model-checking problem is, given a 
model and a CTL formula, determine whether the model satisfies the formula. A formula 
is valid if it is satisfied by every model. As usual, we write p ^ p if {-'p) V p is valid, 
and write p ^ p if p ^ p and p ^ p. Note that if we have p ^ p and M \= p, then 
we also have M \= p. 

We define the usual abbreviations /aZie = -'true, p ^ p = {-'p) V p, and p Ap = 
-'{-'p V -'p), as well as 

AGp = A{p\N false) AFp = A{trueU p). 

The operator G is the global operator, and F is the future operator. For symmetry, we 
also define the weak until operator W and the overlapping strong until operator U: 

A{py\l p) = A{pV pW p) A{p(l p) = A{p\J p Ap). 

* In other words, the operator W is the same as the so-called “release” operator V with the 
operands swapped: pPl p = pV p [9], Our definition of CTL might remind the reader of the 
sublogic ACTL. We are actually defining the full CTL here because we allow negating arbitrary 
formulas. We do not explicitly define the existential fragment because it is not needed in this 
paper. 
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In addition it can be shown that 

A ((/) W r/>) A ((/) W (/) A r/j) A U r/;) A W r/;) A AFr/>. 

To summarize the different versions of until operators, intuitively (f>U ip means that (j) 
remains true until r/> is true if ZY G {W, U}, and until both (/> and r/> are true if ZY G {W, U}. 
If ZY G {W, W}, we also allow ij} to never hold provided (p is true forever. 

For any CTL formula (p, let |(/)] denote the set of states in which p is true. The model- 
checking problem is then equivalent to determining whether the set of initial states is 
a subset of |0] . For any atomic proposition x, the set |a;] is easy to find. For any state 
set S, define pre\/{S) as 

pre\/{S) = { <7 G Q I Vg'. if (g, g') G A then g' G S' }, 

the set of states with every successor in S. The following equations hold for any CTL 
formulas p and p: 

h'/'l = Q\M IA((/)UV')1 = p,Z. {lp]Ll{lpjr\prey{Z))) 

ip\/pj = 1^1 UM |A((/)WV')1 = i^Z. {Ipjnilpju prey{Z))) 

|AX(/)] =prey{lpj) 

where p, and i' are respectively the least fixed-point and greatest fixed-point operators. It 
can be shown that pre v and these fixed points can be computed in time linear in the size 
of the model, which is denoted \M\ and defined as \Q\ + |Z\|. This suggests a model- 
checking algorithm that given p, recursively converts the subformulas of p to sets of 
states in an inside-out manner. The algorithm thus runs in time linear in the size of the 
model and in the length of the formula. Because the formula is evaluated by computing 
predecessors of states, we say that the algorithm is based on backward traversals. 



3 CTL Queries 



The placeholder is a special symbol ?. A CTL query is a string in which the placehol- 
der appears exactly once, and for any CTL formula p, substituting p for ? results in 
a CTL formula. A query is positive if the placeholder appears under an even number 
of negations; otherwise, it is negative. For example, the query AG? is positive, while 
AG(? — >■ AFack) is negative. We use 7(p) to denote the result of replacing the place- 
holder with p in the query 7 . 

For any query 7 and any formulas p and p, we write p =>■''' i/; for (/) f/) if 7 is 

positive, and fotp^ptfy is negative. Intuitively, the lemma below shows that, if 
we view a query as a function that maps formulas to formulas, then depending on the 
polarity of the query, the function is either monotonically increasing or monotonically 
decreasing. 
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Lemma 1. For all formulas f and f and every query 7, iff => f, then yif)) =>-'^ y{ff). 

Proof Idea. Apply structural induction. Note that the truth of this lemma relies on the 
fact that we do not allow bi-implication in CTL queries. □ 

If we have M ^ y{p) for some proposition p, we say that p is a solution to 7 
in M. We write 7(T) and 7(_L) for y{true) and yifalse) respectively if 7 is positive, 
and yifalse) and yfrue) respectively otherwise. Checking whether a given query has a 
solution in a given model can be reduced to model checking. 

Lemma 2. For every query 7 and every model M, we have 1 . 7 has a solution in M 
if and only if M |= 7(T), and 2 . every proposition is a solution to 7 in M if and only 
ifM h 7a). 

Proof The if direction of 1 and the only-if direction of 2 are trivial. For the rest of 
the proof: Consider any proposition p. Because /aZ^e ^ p true, Lemma 1 implies 
yifalse) y{p) y(true), or 7(J_) ^ y{p) =7 7(T). Therefore, if M ^ 7(-L), 
then M ^ 7(p), proving the if direction of 2. Also, if M ^ lip), then M ^ 7(T), 
proving the only-if direction of 1 . □ 

3.1 Exact Solutions and Valid Queries 

Finding an arbitrary solution to a query is not very interesting; finding all solutions does 
not seem useful either, because there are likely to be too many of them. Instead, we would 
like to hnd a single solution that summarizes all solutions. We say that a solution s to 7 
in M is exact if for every solution p, we have s =7^ p. (It is not hard to see from Lemma 1 
that, if s is exact, then p is a solution if and only if s =7^ p.) Solving a query means 
hnding an exact solution to a given query in a given model. 

Note that not every query has an exact solution in a model. Consider the query AF?. 
Suppose X and y are solutions. Note that a; A y is not necessarily a solution because x and 
y may hold at different states on the paths. In this case the query has no exact solutions 
in the model. We say that a query is valid if it has an exact solution in every model. 

For any query 7 and any formulas f and f, we write 0 A'*' ■(/> for (/> A f/' if 7 is positive, 
and for 0 V if 7 is negative. We say that 7 is distributive over conjunction, if for any 
propositions pi and p2, we have (7(pi) A 7(p2)) 7(pi A'>' P2). 

Lemma 3. A query is valid if and only if it has a solution in every model and is distri- 
butive over conjunction. 

Proof. Fix any CTL query 7. For the if direction: Consider any model M. Let P be the 
non-empty set of solutions of 7 in M, and let s be /\f P, which we claim is an exact 
solution. By the dehnition of P, we know that M ^ AreP t(^)- By distributivity, the 
formula is equivalent to 7(5), and therefore s is a solution. Clearly, we have s =7'*' r for 
every r G P, so s is exact. 

For the only-if direction: We only need to show that assuming 7 is valid , it is 
distributive over conjunction. That is, we want to show that for any model M, we have 
^ H {'l{Pi)Ay{p2)) ^ l{Pi A'*' P2) for all propositions Pi and p2. For the t— direction: 
Because we have pi A^ p2 =7''' pi, by Lemma 1, we have 7(pi A^ P2) =7 7(pi), which 
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implies M |= 'y{pi P2) — >■ 7(pi)- The argument for p2 is symmetric. For the — >■ 
direction: If M does not satisfy 7(^1) A 7(^2), we are done. Otherwise, let s be an 
exact solution to 7 in M. By definition, we have s =l 7 pi and s p2, and therefore 
s =l 7 ' {pi A'*' p2)- Lemma 1 now implies that 7(5) 7(^1 A^ p2)- Since by definition 

M 1 = 7(s), we also have M |= 7(^1 A'*' ^2)- LI 

Unfortunately, identifying valid queries is hard in general. 

Theorem 1. Determining whether a given CTL query is valid is complete for EXPTIME. 

Proof. We show that the problem of identifying queries that always admit exact solutions 
and are distributive over conjunction is equivalent to CTL formula validity, which is 
complete for EXPTIME [ 5 , 7 ]. Eor a reduction to CTL validity: By Lemma 2 , a CTL 
query 7 always has an exact solution if and only if the formula fi = 7(T) is valid, 
and, by definition, 7 is distributive if and only if the formula <j)2 = (7(3^1) A 7(2:2)) O 
7(0:1 A^ X2) is valid, where x\ and X2 are atomic propositions not appearing in 7. So 
we simply check whether the formula fi A (P2 is valid. To reduce from CTL validity, 
observe that a CTL formula f is valid if and only if for an atomic proposition x not 
appearing in f, the query A (a: V W ?) always has a solution and is distributive over 
conjunction. □ 

We can solve a valid query using a naive approach: Enumerate the exponentially 
many possible assignments to the atomic propositions. More explicitly, given a set X of 
atomic propositions, an assignment is a proposition of the form 

/\xGY ^ ^ f\xeX\Y 

for some Y C X. A satisfying assignment of a proposition p is an assignment a with 
a ^ p. Note that every proposition is equivalent to the disjunction of its satisfying 
assignments. If 7 is a positive query and P is the set of every proposition -•p such that 
p is an assignment and -•p is a solution to 7, then it can be shown that /\ P is an exact 
solution. If 7 is negative, it can be shown that \f P' is an exact solution, where P' is the 
set of every assignment p that is a solution to 7. 

Lemma 4. Given a valid CTL query 7 and a model M with atomic propositions X, 
solving 7 in M can be done in time 0 (|M| I7I 2 l^l). 

Proof We assume 7 is positive; the case when 7 is negative is similar. Let s denote /\ P, 
where P is defined above. We claim that s is an exact solution. It is a solution by the 
definition of P and the distributivity of 7. To see that it is exact, we want to show s r 
for every solution r. We are done if r is a tautology. Otherwise, let A be the non-empty 
set of all satisfying assignments of -r. Notice that r is equivalent to AasA Consider 
any a G A. By definition, we have a -r, or r -lo. Because r is a solution, by 
Lemma 1 , -lo is also a solution. So by the definition of P, we have -•a G P, and therefore 
s -la, for every a G A. Hence, we have s ^ f\aeA equivalently, s r. 

Computing P amounts to solving model-checking problems, each of which 
takes time 0{\M\ I7I), so the total running time is 0{\M\ I7I 2 l^l). □ 

A time complexity exponential in the number of atomic propositions is hardly desi- 
rable. But notice that only backward traversals are used. We now show how, using mixed 
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forward and backward traversals, the exponential factor can be removed for a subclass 
of valid queries. 

3.2 CTD^ Queries 

Although valid queries are hard to identify, we can syntactically define classes of 
queries that are guaranteed to be valid . Intuitively, there are two major cases in which a 
query is not distributive over conjunction. The first case is when the placeholder appears 
within the scope of a temporal operator that is under an odd number of negations, such 
as ^AG?. These queries are concerned about what happens on some paths. If (pi and 
<p2 are true on some paths, we do not know whether (pi A (p2 holds on any path. (We do 
know that (pi V p2 is true on some paths, but this is not sufficient.) The second case is 
when the placeholder appears on the right hand side of untils, e.g., AF?. Such queries 
ask what will eventually happen. Even if (pi and p2 eventually hold, they may not hold in 
the same states along the paths. There are many exceptions to the second case, however, 
such as A ((/) W A ?) and AFAG?. Our strategy is to define a class of queries that 
excludes these known problems while allowing for the exceptions. 

We first define two additional until operators, the disjoint weak until W and the 
disjoint strong until U, as 

A(py\l p) = A((/)W -<p A p) p) = -Ip A p) 

Formally, we define the class of CTU queries as the smallest set of queries satisfying 
the following: 

- ? and are CTU' queries. 

- If 0 is a CTL formula and 7 is a CTU query, then p\/ AX7, A (7 W p), and 

A ((/) W 7) are also CTU queries. 

- A persistence query is also a CTU query. 

The class of persistence queries is defined as follows: 

- If 7 is a CTU query, then AG7 is a persistence query. 

- If 7 is a persistence query and ^ is a CTL formula, then ^ V 7, AX7, A W 7), and 

A ((/) U 7) are persistence queries. 

Two queries 71 and 72 are equivalent, written 71 72, if we have ji{p) 'j2{P) 

for every p. Additional CTL'' queries are allowed using these equivalences: 

'y V p p\/ ^ AGj A (7 \N false) 

AF7 A (true U 7) A (7 W (/)) A ((/> V 7 W 0) . 

In other words, in CTL'' queries: 1 . Negations can only be applied to the placeholder 
or to CTL formulas. 2. The placeholder cannot appear on either side of U or U, on the 
left hand side of W or U, or on the right hand side of W. 3. If the placeholder appears 
on the right hand side of W or U, then there must be an AG between the placeholder 
and the until. The first restriction on negation is to avoid querying existentially about 
paths. The second restriction on untils is to ensure that we do not have any eventuality 
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obligation (which may not be fulfillable in every model). The third restriction rules out 
queries like AF? but allows valid queries like AFAG?. 

Note that W and U are not monotone with respect to the left operand. But since we 
do not allow the placeholder to appear on their left hand side, this is not a problem and 
Lemma 1 still holds. 

Examples of queries in this class include A {-•shutdown W ?) : what is true when the 
first shutdown occurs; AG{shutdown — ^ AG?): what is invariably true after shutdown; 
AG(? AFflc^): what is true before an acknowledgement is sent; AG(? ~^AF ack): 
what is true so that an acknowledgement may never be sent; AFAG?: what is the set of 
persistent states, or, roughly, the set of states within which the system eventually stays; 
and the more complex examples given in Section 1. Note that the notion of persistence 
here is that of the branching time, which seems less useful than the linear-time notion. 
We will come back to this issue in Section 6. 

Not only are these queries guaranteed to be valid , they can be efficiently solved by 
mixing forward and backward traversals, i.e., by applying prey and post^, where 

post^{S) = { q' G Q \ 3q. {q, q') G Z\ and 9 G S' }, 
the set of successors of states in S. For any CTL formula 4>, the set is defined as 

Rcj, = pZ. ((SUpoita(Z)) n |(/)]), 

or the set of states reachable from S going through only the states that satisfy (j). Figure 1 
shows a procedure Solve that takes a 0X17 query and a state set, and returns a state set. 



Solve{7, S) 
Solvei^l, S) 
Solve{4> V 7, S) 
Solve{AX'y , S) 
Solve{ A (7 W (()) , S) 
Solve ( A ((() W 7) , S) 
Solve ( A ((() W 7) , S) 

Solve{A{(f) U 7) , S) 



S 

Q\S 

Solvei'y, S\ M) 

Solve{'y, post (S)) 

Solve{'y,SuR ^Upost {R ^)) 

Solve{'y, {Syjpost (R^)) \ [<(>1) 

Solve{'y, B) 
where R = R^ 

B = {SUpost (R)) \ ([01 U [7(-L)D 
Solve{'y, B U C) 
where C = uZ. {R n post (Z)) 

R and B are the same as above 



Fig. 1 : Solving CTL'' queries (7 is any CTL'' query, <j) is any CTL formula, and S C Q 
is any state set) 



The idea is that if 7 is any CTL'' query, M is any model with initial states Qo, and S is 
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the result of Solve{j, Qo), then the characteristic function of S, namely 

VgeS (^AxeL{q) ^ ^ AxeX\L{q) ! 

is an exact solution to 7 in M. The procedure Solve runs in time linear in the size of the 
model and linear in the length of the query. 



4 Simplification 

Recall that our motivation is to help the user understand the system behaviors. Although 
an exact solution gives complete information, it is likely to he too complex to compre- 
hend. In this section, we suggest a strategy to cope with the problem by decomposing a 
proposition into a set of conjuncts (for positive queries) or disjuncts (for negative que- 
ries) using projection and don’t-care minimization. Decomposition is not a new problem 
in symbolic model checking, but the usual objective is to produce a small number of 
small, balanced conjuncts or disjuncts to reduce the time or space for the fixed-point 
computation [e.g., 11]. Our purpose, rather, is to decompose a proposition into a possibly 
large number of “simple” pieces. 

Without loss of generality, we assume positive queries in this section; disjunctive 
decomposition for negative queries can then be dealt with using DeMorgan’s Law. Our 
conjunctive decomposition is a conservative approximation in the sense that the conjun- 
ction obtained may be weaker than the given proposition. Indeed, our method bears some 
resemblance to the technique of overlapping projections for approximate traversals [8]. 

Let 3Y .p denote the result of projecting the proposition p onto a set Y of atomic 
propositions. For any symbolic state-set representation for model checking, an imple- 
mentation of projection is usually available because it is most likely used to implement 
pre\/ and post^. 

For any propositions p and c, let p j, c be any proposition with p A c (p j, 
c) A c. Intuitively, the proposition -ic represents a don’t-care condition. Typically, an 
implementation of the operation tries to choose a result that minimizes the representation 
of p j, c, and we assume that a minimized representation is also simpler to human users. 
For BDDs, many operators can be used for this purpose, such as restrict [4]. For a set C 
of propositions, let p j, C be any proposition with p A (/\ C) (p j, C) A (/\ C) . In our 
implementation, we perform pj, C simply by computing pj, /\ C using restrict, although 
there are other possibilities. 

Figure 2 shows a greedy algorithm for approximate conjunctive decomposition. We 
use atoms{s) to denote the set of atomic propositions appearing in s. With increasing j 
up to the given k, the algorithm finds nontrivial propositions that are weaker than s and 
contains only j atomic propositions. Redundant information in the result is reduced by 
simplifying a candidate conjunct using other conjuncts already computed. The algorithm 
runs in time exponential in k. However, this is not a serious problem in practice, because 
the result will be too complicated to understand for large k anyway. In our preliminary 
experience, we have only used k < 4. 

To reduce noise from the output, before we run the decomposition algorithm, it helps 
to project the proposition s onto the set of atomic propositions which the user is truly 
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{Input: proposition s 

and k with 0 < fc < \ atoms{s)\ } 

C := 0 

for j := 1 to fe 

for each Y C atoms{s) with \Y\ = j 
r ■- (3F. s)^C 
if r true and r s 
C :=CU{r} 
fi 
end 
end 

(Output: C with s ^ /\C} 



Fig. 2: Approximate conjunctive decomposition of a proposition 



interested in. One way to find out these interesting atomic propositions is to examine 
the temporal-logic formulas to be checked. Sometimes the number of relevant atomic 
propositions might appear large, but the user may only want to derive properties of a 
restricted form. For example, if a subset of the atomic propositions contains Xq, Xi, 
... , Xn, the user may not be interested in each of them individually, but only in their 
disjunction. In this case, we can create a new atomic proposition d and compute 

s /\{d {xq V Xi V • • • V x„)). 

We then project out xg, xi, . . . , x„, and use the result as an input to the decomposition 
algorithm. 

A final remark is that, for simplicity, we have been focusing on only atomic propo- 
sitions in our discussion. However, a model is often specified as a high-level program 
with some of its variables ranging over finite domains. In this case, several atomic pro- 
positions are used to encode a single source-level variable. When we perform projection 
and decomposition, we actually operate on these source-level variables instead of the 
atomic propositions to obtain more meaningful results. 

5 Applications 

In addition to allowing the user to infer properties based on a given pattern, temporal- 
logic queries also suggest an alternative model-checking algorithm. Given a model M 
and a formula <j), instead of determining M \= <j> using the standard backward traversals, 
we can find a query 7 wifh (f> ^ ^{p) for some proposition p, and then compute its exact 
solution s in M. We have M \= (f> if and only if s =F p. This gives a model-checking 
algorithm with mixed forward and backward traversals. An advantage of this over the 
conventional approach is that, in case (p does not hold, the formula 7 ( 5 ) gives insights 
into why (p fails. As an example, suppose that we check AG((x A y) ^ AFack) by 
evaluating AG(? — ^ AFack), and obtain x A y A z as an exact solution. This tells us that 
the formula does not hold because, apart from x and y, the condition z is also necessary 
for ack to occur. 
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Furthermore, the approach can be used for detecting a particular form of vacuity [1]. 
Let she 3 atoms (p) . s, that is, the projection of s on to the atomic propositions appearing 
in p. Because s' is the strongest (or weakest, for negative queries) of the solutions that 
involve only the atomic propositions in atoms (p), we have M \= (j) if and only if 
s p. However, if we have s' =>■''' p but s p, we may say that (p trivially holds 
because the stronger formula 7 ( 5 ^) also holds. For the previous example, if we obtain x 
after projecting the exact solution to AG(? ^ AF ack) on x and y, we know that y is not 
needed to produce ack and that the original formula holds vacuously. Or the user may 
suspect that AG(x — y) holds, and can verify this using model checking to learn more 
about the model. 

In the rest of this section, we report on some initial experience of applying temporal- 
logic queries to two SIVTV models. We found that the technique is most useful when 
unexpected properties are inferred. 



5.1 A Cache Consistency Protocol 

We applied our algorithm to an abstract model of a cache consistency protocol that comes 
withCMU’s SMV 2.5.3 distribution.^ The model has 3408 reachable states. Among other 
components, it consists of three processors po,Pi, and p 2 . The temporal-logic properties 
specified in the program are concerned with the propositions po-readable, pq. writable, 
Pi.readable, and pi. writable. For example, an invariant listed is 



AG^{po. writable A pi. writable). (1) 

That is, po and pi are never simultaneously writable. Note that no properties listed in 
the code are about p 2 . Its correctness was probably assumed by the symmetries in the 
code. 

We asked the query AG?, and projected the exact solution obtained onto pi. readable 
and Pi. writable for i G {0, 1, 2}. Using the conjunctive decomposition algorithm in 
Figure 2 with k = A, the following invariants, in addition to Formula (1) above, were 
inferred: 



AG->P 2 .writable 


( 2 ) 


AG~'P 2 .readable 


(3) 


AG{pq. writable -A po -readable) 


(4) 


AG {pi- writable -A pi.readable) 


(5) 



Formulas (4) and (5) are evident from the code. However, Formulas (2) and (3) are 
surprising. They indicate that p 2 is never readable nor writable, and therefore p 2 is 
not symmetric to po or pi. Upon closer examination of the code, we found a typo 
in the SMV program that caused p 2 ’& faulty behaviors. We fixed the error, and, as 
expected, inferred that Formulas (2) and (3) no longer hold, and that the model satisfies 

^ File gigamax . smv in http : //www . cs . cmu.edu/modelcheck/smv/ smv .r2 . 5 . 3 . Id. tar .gz 
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AG{pi.writable — >■ pi. readable) and AG~'{pi. writable Apj .writable) for every distinct 
i,j G {0,1, 2}. In addition, we also discovered 

AG((pi.readable A pj .readable) -A ~<pk. writable) (6) 

for every distinct i,j, k G {0,1, 2}. It says that if any two of the processors are readable, 
then the remaining one cannot be writable. This is not a natural property that one would 
expect from every cache consistency protocol. 

5.2 A Shuttle Digital Autopilot 

Another example that we looked at was an SMV model of the “shuttle digital autopilot 
engines out (3E/0) contingency guidance requirements” in the NuSMV 1 . 1 distribution.^ 
There are 70 source-level variables and over lO^"* reachable states. 

One of the properties listed in the SMV program is 

AG{-<cg.idle — > AF eg .finished) , (7) 

which says that the component eg eventually terminates after it is started. We evaluated 
the query AG(? — AF {eg. finished)), and projected the exact solution on all singleton 
sets of variables. In addition to the formula above, we also inferred the following two 
properties:"^ 

AG{->cs.idle -A AF eg .finished) (8) 

AG {-> start -guide — > AF eg .finished) . (9) 

Formula (9) is easy to see from the SMV program and is not very interesting. Formula (8), 
however, does not seem obvious; it says that after the component cs is started, eg will 
eventually finish. Given Formulas (7) and (8), it is natural to ask whether there is any 
causality relationship between -•eg. idle and ^cs.idle. So we checked the formulas 

AG{-ics.idle -A AF->cg.idle) (10) 

AG{->cg. idle ^ AF->cs. idle) (11) 

and found that the first formula holds while the second does not. Note how this process 
of model checking and evaluating queries in tandem allowed us to discover relationships 
between the two components cs and eg. 

Another formula listed is 

AG{{cg .idle V eg .finished) — > ~^AG{{cg .idle V eg .finished) V AG~> eg. finished)). 

(12) 

We evaluated the query AG(? -T -^AG{{cg .idle V eg .finished) V AG-' eg. finished)), 
and, to our surprise, obtained true as the exact solution. This indicates that the stronger 
formula 



AG^AG{{cg .idle V eg .finished) V AG-' eg .finished) (13) 

^ The SMV model was written by Sergey Berezin, 
http : //afrodite . itc . it : 1024/~nusmv/examples/guidance/guidance . smv 
What we call cs.idle here corresponds to cs . step = undef in the SMV program. 
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holds, and that Formula (12) is in a sense vacuously true. 

As our last example, cs.r is an enumerated-type variable with a range of size six. 
Comments in the SMV program suggest checking the six formulas of the form 

^AG(cs.r ^ c) (14) 

for every c in the range of cs.r, to ensure that the variable may take on any value in its 
range. We instead asked only one query AG?, and, after projecting the exact solution on 
cs.r, obtained true. This implies that there are no constraints on cs.r in the reachable 
states, and therefore it may take on any of its possible values. 



6 Future Work 

One interesting direction for future work is to extend the results to Linear Temporal 
Logic (LTL). All of our definitions extend to LTL in a straightforward way, and Lemmas 
1-3 hold for LTL queries as well. The proof of Theorem 1 can be trivially modified 
for LTL queries, so it can be easily seen that determining whether an LTL query is 
valid is equivalent to determining LTL formula validity, which is complete for PSPACE. 
An advantage of LTL queries over CTL queries is the expressiveness. For example, if 
the CTL formula AG{req — ^ AF ack) does not hold (or equivalently, the LTL formula 
G{req — ^ Fact) does not hold), the user can ask the LTL query 

G{req — ?► F(ackV G?)) 

to find out what is eventually always true if a request is never followed by an acknow- 
ledgement. This in essence gives a summary of all the counterexamples to the formula 
above. Note that the similar CTL query 

AG{req — >■ AF (ack V AG?)) 

is much weaker. However, in general, it is not obvious how to evaluate valid LTL queries. 
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Abstract. In classical timed automata, as defined by Alur and Dill 
[AD90,AD94] and since widely studied, the only operation allowed to 
modify the clocks is the reset operation. For instance, a clock can neither 
be set to a non-null constant value, nor be set to the value of another 
clock nor, in a non-deterministic way, to some value lower or higher than 
a given constant. In this paper we study in details such updates. 

We characterize in a thin way the frontier between decidability and un- 
decidability. Our main contributions are the following : 

- We exhibit many classes of updates for which emptiness is undecid- 
able. These classes depend on the clock constraints that are used - 
diagonal-free or not - whereas it is well known that these two kinds 
of constraints are equivalent for classical timed automata. 

- We propose a generalization of the region automaton proposed by 
Alur and Dill, allowing to handle larger classes of updates. The 
complexity of the decision procedure remains PsPACE-complete. 



1 Introduction 

Since their introduction by Alur and Dill [AD90,AD94], timed automata are 
one of the most studied models for real-time systems. Numerous works have been 
devoted to the “theoretical” comprehension of timed automata and their exten- 
sions (among a lot of them, see [ACD+92], [AHV93], [AFH94], [ACH94], [Wil94], 
[HKWT95], [BDOO], [BDGP98]) and several model-checkers are now available 
(HyTechI [HHWT95,HHWT97], Kronos^ [Yov97], Uppaal^ [LPY97]). These 
works have allowed to treat a lot of case studies (see the web pages of the tools) 
and it is precisely one of them - the ABR protocol [BF99,BFKM99] - which has 
motivated the present work. Indeed, the most simple and natural modelization 
of the ABR protocol uses updates which are not allowed in classical timed au- 
tomata, where the only authorized operations on clocks are resets. Therefore we 

* This work has been partly supported by the french project RNRT “Calife” 

^ http : / /www-cad . eecs . berkeley . edu/~tah/HyTech/ 

^ http : / /www-ver imag . imag . f r/TEMPORISE/kronos/ 
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E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 464—479, 2000. 
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have considered updates constructed from simple updates of one of the following 
forms: 

x c I X y + c, where x, y are clocks, c € Q+, and ~ G {<,<,=, >, >} 

More precisely, we have studied the (un)decidability of the emptiness problem 
for the extended timed automata constructed with such updates. We call these 
new automata updatable timed automata. We have characterized in a thin way 
the frontier between classes of updatable timed automata for which emptiness 
is decidable or not. Our main results are the following : 

- We exhibit many classes of updates for which emptiness is undecidable. A 
surprising result is that these classes depend on the clock constraints that are 
used “ diagonal- free (z. e. where the only allowed comparisons are between a 
clock and a constant) or not (where the difference of two clocks can also be 
compared with a constant). This point makes an important difference with 
“classical” timed automata for which it is well known that these two kinds 
of constraints are equivalent. 

- We propose a generalization of the region automaton proposed by Alur and 
Dill, which allows to handle large classes of updates. We thus construct an 
(untimed) automaton which recognizes the untimed language of the consid- 
ered timed automaton. The complexity of this decision procedure remains 
P SPACE-complete . 

Note that these decidable classes are not more powerful than classical timed 
automata in the sense that for any updatable timed automaton of such a 
class, a classical timed automaton (with £— transitions) recognizing the same 
language - and even most often bisimilar - can be effectively constructed. 
But in most cases, an exponential blow-up seems unavoidable and thus a 
transformation into a classical timed automaton can not be used to obtain 
an efficient decision procedure. These constructions of equivalent automata 
are available in [BDFPOOb]. 

The paper is organized as follows. In section 2, we present basic definitions of 
clock constraints, updates and updatable timed automata, generalizing classical 
definitions of Alur and Dill. The emptiness problem is briefly introduced in 
section 3. Section 4 is devoted to our undecidability results. In section 5, we pro- 
pose a generalization of the region automaton defined by Alur and Dill. We 
then use this procedure in sections 6 (resp. 7) to exhibit large classes of updata- 
ble timed automata using diagonal-free clock constraints ( resp. arbitrary clock 
constraints) for which emptiness is decidable. A short conclusion summarizes 
our results. 

For lack of space, this paper does not contain proofs which can be found in 
[BDFPOOa]. 

2 About Updatable Timed Automata 

In this section, we briefly recall some basic definitions before introducing an ex- 
tension of the timed automata, initially defined by Alur and Dill [AD90,AD94]. 
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2.1 Timed Words and Clocks 

If Z is any set, let Z* (resp. be the set of finite (resp. infinite) sequences of 
elements in Z. And let Z°° = Z* Z^ . 

In this paper, we consider T as time domain, Q_|_ as the set of non-negative 
rational and A as a finite set of actions. A time sequence over T is a finite or 
infinite non decreasing sequence r = € T°°. A timed word to = {ai,ti)i>i 

is an element of (A x T)°°, also written as a pair uj = (u,t), where a = (aj)i>i 
is a word in Z°° and r = (ti)i>i a time sequence in T°° of same length. 

We consider an at most countable set X of variables, called clocks. A clock 
valuation over X is a mapping u : X ^ T that assigns to each clock a time value. 
The set of all clock valuations over X is denoted ¥*■. Let t € T, the valuation 
V + t is defined by (v + t)(x) = v(x) + t,Vx € X. 

2.2 Clock Constraints 

Given a subset of clocks A C X, we introduce two sets of clock constraints over 
X. The most general one, denoted by C(X), is defined by the following grammar: 

ip ::= X ^ c\x — y ^ c\{p f\ !.p\^{p \ true 

where x,y € X, c€ Q+, ~ G {<,<,=, >, >} 

We will also use the proper subset of diagonal-free constraints, denoted by 
Cdf(X), where the comparison between two clocks is not allowed. This set is 
defined by the grammar: 

If ::= X ^ c\ip /\ p \ true, 

where x G A, c G Q+ and ~ G {<, <, =, yf, >, >} 

We write v \= p when the clock valuation v satisfies the clock constraint p. 

2.3 Updates 

An update is a function from to T^(T^) which assigns to each valuation a 
set of valuations. In this work, we restrict ourselves to local updates which are 
defined in the following way. 

A simple update over a clock z has one of the two following forms: 
up ::= z :~ c I z j/ -I- d 

where c G Q+, d G <Q, y G X and ~ G {<, <, =, yf, >, >} 

Let V be a valuation and up be a simple update over z. A valuation v' is in up{v) 
if v'{y) = v(y) for any clock y z and if v'{z) verifies: 

J v'{z) ~ c if up = z c 

v'{z) ~ v{y) d ii up = z y d 
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A local update over a set of clocks A is a collection up = {upi)i<i<k of simple 
updates, where each upi is a simple update over some clock Xi € X (note that 
it could happen that Xi = xj for some i ^ j). Let v, v' G T” be two clock 
valuations. We have v' G up{v) if and only if, for any i, the clock valuation v" 
defined by 

f v”{xi) = v'{xi) 

1 v"{y) = v{y) for any y ^ Xi 

verifies v" G upi{v). The terminology ZocaZ comes from the fact that v'{x) depends 
on x only and not on the other values v'{y). 

Example 1. If we take the local update {x :> y,x :< 7), then it means that the 
value v'{x) must verify : v'{x) > v{y) /\v' {x) < 7. Note that up{v) may be empty. 
For instance, the local update (x :< l,x :> 1) leads to an empty set. 

For any subset X of X, U{X) is the set of local updates which are col- 
lections of simple updates over clocks of X. In the following, we need 
to distinguish the following subsets oiU{X) : 

- Uo{X) is the set of reset updates. A reset update up is an update such that 
for every clock valuations v, v' with v' G up(y) and any clock x € X, either 
v'{x) = v{x) or v'{x) = 0. 

- Ucst{X) is the set of constant updates. A constant update up is an update 
such that for every clock valuations v, v' with v' G up{v) and any clock 
X G X, either v'{x) = v(x) or v'(x) is a rational constant independent of 
v{x). 



2.4 Updatable Timed Automata 

An updatable timed automaton over T is a tuple A = (A, Q, T, I, F, R, A), where 
A is a finite alphabet of actions, Q is a finite set of states, A C X is a finite set 
of clocks, T C Q X [C(A) x A x 7/(A)] x Q is a finite set of transitions, I Q Q 
is the subset of initial states, F C Q is the subset of final states, R C Q is the 
subset of repeated states. 

Let C C C(X) be a subset of clock constraints and U C U(X) be a subset of 
updates, the class Aut{C,U) is the set of all timed automata whose transitions 
only use clock constraints of C and updates of U. The usual class of timed 
automata, defined in [AD90], is the family Awt(C£i/ (X),7/o(X)). 

A path in A is a finite or an infinite sequence of consecutive transitions: 

p = Qo > <?i > 92 ■ • ■ , where upi, Qi) GT, Vi > 0 

The path is said to be accepting if it starts in an initial state (go G I) and either 
it is finite and it ends in an final state, or it is infinite and passes infinitely 
often through a repeated state. A run of the automaton through the path P is 
a sequence of the form: 
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where r = is a time sequence and (wi)i>o are clock valuations such that: 

( vo(x) = 0, Vx G X 

\ '^i—l “t” {ii l) 

[ Ui G upi {vi-i + {ti - ti-i)) 

Remark that any set upi{vi-i + (ti — ti-i)) of a run is non empty. 

The label of the run is the timed word w = (oi, ti)(a 2 , ^ 2 ) ■ ■ • If the path P is 
accepting then the timed word w is said to be accepted by the timed automaton. 
The set of all timed words accepted by A over the time domain T is denoted by 
L{A,T), or simply L{A). 

Remark 1. A “folklore” result on timed automata states that the families 
Aut(C(X),Z^o(X)) and Aut(Cd/(X),Z^o(X)) are language-equivalent. This is be- 
cause any classical timed automaton (using reset updates only) can be trans- 
formed into a diagonal-free classical timed automaton recognizing the same 
language (see [BDGP98] for a proof). Another “folklore” result states that 
constant updates are not more powerful than reset updates i.e. the families 
Aut{C{X),Ucst(^) and Aut(C(X),Zio(X)) are language-equivalent. 

3 The Emptiness Problem 

For verification purposes, a fundamental question about timed automata is to 
decide whether the accepted language is empty. This problem is called the empti- 
ness problem. To simplify, we will say that a class of timed automata is decidable 
if the emptiness problem is decidable for this class. The following result, due to 
Alur and Dill [AD90], is one of the most important about timed automata. 

Theorem 1. The class Aut{C{X) is decidable. 

The principle of the proof is the following. Let A be an automaton of 
Aut(C(X),ZYo(X)), then a Biichi automaton (often called the region automaton of 
A) which recognizes the untimed language Untime(L(A)) of L{A) is effectively 
constructible. The untimed language of A is defined as follows : Untime(L(A)) = 
{a G S°° I there exists a time sequence r such that (cr, r) G L{A)}. 

The emptiness of L{A) is obviously equivalent to the emptiness of Un- 
time(L(A)) and since the emptiness of a Biichi automaton on words is decidable 
[HU79], the result follows. In fact, the result is more precise: testing emptiness 
of a timed automaton is PsPACE-complete (see [AD94] for the proofs). 

Remark 2. From [AD94] (Lemma 4.1) it suffices to prove the theorem above for 
timed automata where all constants appearing in clock constraints are integers 
(and not arbitrary rationals). Indeed, for any timed automaton A, there exists 
some positive integer 5 such that for any constant c of a clock constraint of A, 
5.C is an integer. Let A' be the timed automaton obtained from A by replacing 
each constant chy 5 ■ c, then it is immediate to verify that L{A!) is empty if and 
only if L{A) is empty. 




Are Timed Automata Updatable? 469 



4 Undecidable Classes of Updatable Timed Automata 

In this section we exhibit some important classes of updatable timed automata 
which are undecidable. All the proofs are reductions of the emptiness problem 
for counter machines. 



4.1 Two Counters Machine 



Recall that a two counters machine is a finite set of instructions over two counters 
(x and y) . There are two types of instructions over counters: 

- incrementation instruction of counter i G {x,y} : 

p i := i + 1] goto q (where p and q are instruction labels) 



- decrementation (or zero-testing) instruction of counter i G {x,y} : 



p : if t > 0 



then i := i — 1; goto q 
else goto q' 



The machine starts at instruction labelled by sq with x = y = 0 and stops at a 
special instruction Halt labelled by s/. 

Theorem 2. The emptiness problem of two counters machine is undecidable 
[Min67]. 



4.2 Diagonal- Free Automata with Updates x x — 1 

We consider here a diagonal-free constraints class. 

Proposition 1. Let U be a set of updates containing both {x := x — 1 | x G X} 
andU^ifK). Then the class Aut{CdfOQ,U) is undecidable. 

Sketch of proof. We simulate a two counters machine A4 with an updatable 
timed automaton Am = {A!,Q,T, I , F, R, X) with X = {x,y,z}, E = {a} (for 
convenience reasons labels are omitted in the proof) and equipped with updates 
X := X — 1 and y := y — 1. Clocks x and y simulate the two counters. 
Simulation of an increment appears on Figure 1. Counter x is implicitly incre- 
mented by letting the time run during 1 unit of time (this is controlled with the 
test z = 1). Then the other counter y is decremented with the y := y —I update. 




Fig. 1. Simulation of a incrementation operation over counter x. 



Simulation of a decrement appears on Figure 2. Counter x is either decremented 
using the x := x — 1 update if x > 1, or unchanged otherwise. 
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Fig. 2. Simulation of a decrementation operation on the counter x. 



Remark that we never compare two clocks but only use guards of the form t ~ c 
with i G {x, y, z} and c G {0, 1}. 

To complete the definition of Am 7 we set / = {sq} and F = {s/}. The language 
of A4 is empty if and only if the language of Am is empty and this implies 
undecidability of emptiness problem for the class Aut{Cdf(^M)- 

4.3 Automata with Updates a::=a; + lora;:>Oora::>yora;:<y 

Surprisingly, classes of arbitrary timed automata with special updates are unde- 
cidable. 

Proposition 2. LetU he a set of updates eontaining Uq{'^) and (1) {x \= x + 
1 I cc e X} or (2) {x :> 0 I x e X} or (3) {x \> y\x,y G X} or (4) {x :< y\x,y G 
X}, then the class Aut{C{'K) ,U) is undecidahle. 

Sketch of proof. The proofs are four variations of the construction given for 
proposition 1. The idea is to replace every transition labelled with updates x := 
x — 1 or y := y — 1 (framed with dashed lines on pictures) by a small automaton 
involving the other kinds of updates only. The counter machine will be now 
simulated by an updatable timed automaton with four clocks {w,x,y, z}. We 
show how to simulate an x := x — 1 in any of the four cases : 

(1) Firstly clock w is reset, then update rc := ru+ 1 is performed until x — w=l 
(recall that x simulates a counter and that we are interested to its integer 
values). Secondly, clock x is reset and update x := x + 1 is performed until 
X = w. 

(2) A ic :> 0 is guessed, followed by a test x — w = 1. Then a x :> 0 is guessed, 
followed by a test x = w. 

(3) Clock w is reset, w :> w is guessed and test x — ic = 1 is made. Then clock 
X is reset, x :> x is guessed and test x = w is made. 

(4) A ru :< X is guessed, followed by test x — w = 1. Then a x :< x is guessed, 
followed by a test x = w. 

In the four cases, operations are made instantaneously with the help of test z = 0 
performed at the beginning and at the end of the decrementation simulation. 
Remark that for any case we use comparisons of clocks. We will see in section 6 
that classes of diagonal-free timed automata equipped with any of these four 
updates are decidable. 

Let us end the current section with a result about mixed updates. Updates of 
the kind y + c <■ x :< z + d (with c, d gN) can simulate clock comparisons. In 
fact, in order to simulate a test x — w = 1, it suffices to guess a, w + 1 <: z' :< x 
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followed by ax <: z' :< w+1. Both guesses have solutions if and only if 
[tu + 1; x] = [x; IX + 1] = {x} if and only if (x — tu = 1). In conclusion, we cannot 
mix different kinds of updates anyhow, while keeping diagonal-free automata 
decidable: 

Proposition 3. Let U he a set of updates containing Uq{^) and {x + c<\ y \< 
z -I- d I X, j/, z G X, c, c' G N}. Then the class Aut{Cdf{^),U) is undecidahle. 

5 Construction of an Abstract Region Automaton 

We want to check emptiness of the timed language accepted by some timed au- 
tomaton. To this aim, we will use a technique based on the original construction 
of the region automaton ([AD94]). 

5.1 Construction of a Region Graph 

Let X C X be a finite set of clocks. A family of regions over A is a couple 
(7^, Succ) where 7^ is a finite set of regions (z.e. of subsets of T^) and the 
successor function Succ : TZ ^ TZ verifies that for any region R G TZ the following 
holds: 

- for each v G R, there exists 7 G T such that v + t G Succ(i?) and for every 
0 < t' < t, V + t' G {RU Succ(i?)) 

- if X G 7?, then for alH G T, v + t G Succ* (7?) 

Let U C 77(A) be a finite set of updates. Each update up GU induces naturally a 
function up : TZ ^ V{TZ) which maps each region R into the set {77' G 7^ | up{R)r\ 
77' yf 0}. The set of regions TZ is compatible with 77 if for all up GU and for all 
77, 77' G 77.: 



77' G up{R) 4=^ Vw G 77, 3v' G 77' such that v' G up{v) 

Then, the region graph associated with (77, Succ, 77) is a graph whose set of nodes 
is TZ and whose vertices are of two distinct types: 

77 — > R' if 77' = Succ(77) 

77 =^up 77' if 77' G up{R) 

Let C C C(A) be a finite set of clock constraints. The set of regions 77 is com- 
patible with C if for all (/? G C and for all R G TZ: either 77 C or 77 C 

5.2 Construction of the Region Automaton 

Let A be a timed automaton in Aut{C,U). Let (77, Succ) be a family of re- 
gions such that 77 is compatible with C and 77. We define the region automaton 
7k,succ(A) associated with A and (77, Succ), as the finite (untimed) automaton 
defined as follows: 
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- Its set of locations is Q x 7^; its initial locations are {qo, 0) where qq is initial 
and 0 is the region where all clocks are equal to zero; its repeated locations 
are (r, R) where r is repeated in A and R is any region; its final locations 
are (/, R) where / is final in A and R is any region. 

- Its transitions are defined by: 

• {q, R) — > {q, R') if i? ^ i?' is a transition of the region graph, 

• {q,R) {q',R') if there exists a transition {q,(p,a,up,q') in A such 

that R C ip and R =^up R' is a transition of the region graph. 

Theorem 3. Let A he a timed automaton in Aut{C,U) where C (resp. U) is 
a finite set of clock constraints (resp. of updates). Let (7^, Succ) he a family of 
regions such that TZ is compatihle with C andU. Then the automaton /k,succ(-4) 
accepts the language Untime(L(^)). 

Assume we can encode a region in a polynomial space, then we can decide the 
emptiness of the language in polynomial space. It suffices to guess an accepted 
run in the automaton by remembering only the two current successive configu- 
rations of the region automaton (this is the same proof than in [AD94]). 

We will now study some classes of timed automata, and consider particular 
regions which verify the conditions required by the region automaton. This will 
lead us to some decidability results using the above construction. 



6 Considering Diagonal-Free Updatable Timed Automata 



Definition of the Regions We Consider - We consider a finite set of clocks A C X. 
We associate an integer constant Cx to each clock x G X, and we define the set 
of intervals: 



Rx = {[c] I 0 < c < Ca;} U {]c; c-l- 1[ I 0 < c < Ca;} U {]cx] +oo[} 



Let a be a tuple {(Ix)xex, where: 

- Vx G X , Ix & Tx 

- ^ is a total preorder on Aq = {x G A | U is an interval of the form ]c; c-|- ![} 
The region (defined by) a is thus 



1 


r 


Vx G A, v{x) G Ix I 


R{a) = < 




Vx, y G Ao, the following holds > 

X y 4=^ frac(u(x)) < frac(u(j/)) J 



The set of all regions defined in such a way will be denoted by TZ(c^)^^x- 




Example 2. As an example, assume we have 
only two clocks x and y with the constants 
Cx = i and Cy = 2. Then, the set of regions 
associated with those constants is described in 
the figure beside. The hashed region is defined 
by the following: U =]1; 2[, ly =]0; 1[ and the 
preorder ^ is defined hy x ^ y and y x. 
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We obtain immediately the following proposition: 

Proposition 4. Let C C Cdf{X) he such that for any clock constraint x ^ c of 
C, it holds c < Cx- Then the set of regions compatible with C. 

Note that the result does not hold for any set of constraints included in C{X). 
For example, the region (]1; +oo[x]l; +oo[, 0) is neither included in x — y < 1 
nor in X — y > 1 . 



Computation of the Successor Function - Let R = {{Ix)xex, be a region. We 
set Z = {x G X\Ix is of the form [c]}. Then the region Succ(i?) = {{I'^)x^x, 
is defined as follows, distinguishing two cases: 

1. If Z yf 0, then 

( Ix if X ^ Z 

- = I ]c,c + l[ if Ix = [c] with Cx 

[]c,c,oo[ if Ix = [Cx] 

- X y if {x ^ y) or Ix = [c] with cf^Cx and Vy has the form ]d,d+ 1[ 

2. If Z = 0, let M be the set of maximal elements of Then 

- V = l if a; ^ M 

|_[c+l] if a; G M and /a; =]c, c+ 1[ 

- a' is the restriction of ^ to {x G X | /^ has the form ]d, d + ![} 

Taking the previous example, the successor of the gray region is defined by 
Ix =]1; 2[ and ly = [1] (drawn as the thick line). 

We will now define a suitable set of updates compatible with the regions. 



What About the Updates ? - We consider now a local update up = {upx)xex 
over a finite set of clocks X C X such that for any clock x, upx is in one of 
the four following subsets of U{X), each of them being given by an abstract 
grammar: 

- detx '■'■= X := c \ X := z + d with c G N, d G Z and z G X. 

- infx '■'■= X :<l c I X :<l z + d I infx A infx with OG {<, <}, c G N, d G Z and 

z G X. 

- supx '■'■= X :[> c I X :[> z + d I supx A supx with OG {>, >}, c G N, d G Z and 
z G X. 

- intx ::= x :G (c; d) | x :G (c; z + d) | x :G (z + c; d) | x :G (z + c; z + d) where 

( and ) are either [ or ], z is a clock and c, d are in Z. 

Let us denote by Ui{X) this set of local updates. As in the case of simple updates, 
we will give a necessary and sufficient condition for R' to be in wp{R) when R, 
R' are regions and up is a local update. 
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Case of Simple Updates - We will first prove that for any simple update up, 
^(crr)xex compatible with up. To this aim, we construct the regions belonging 
to up{R) by giving a necessary and sufficient condition for a given region i?' to 
be in up{R). 

Assume that R = {{Ix)xex, where ^ is a total preorder on Xq and that up is 
a simple update over z, then the region R' = ~<') (where is a total 

preorder on Xq) is in up{R) if and only if for all x z and : 

if up = z c with c G N : I' can be any interval of Iz which intersects 
{7 I 7 ~ c} and 

— either J' has the form [d] or ]cz; +oo[, Xq — Xq \ {z} and ^'=-< n(XQ x 

— either /' has the form ]d; d + 1[, Xq = Xq Li {z} and is any total 
preorder which coincides with ^ on Xq \ { 2 :}. 

if up = z y + c with c G Z : we assume in this case that Cz < Cy + c. Thus 
if ly is any interval in ly then /^ + c is included in an interval of Iz (in 
particular, whenever ly is non bounded then dy + c is non bounded, which 
is essential in order to prove the compatibility). 

/' can be any interval of Iz such that there exists a G 1'^., f3 & ly with 
a ^ (3 + c and 

— either J' has the form [d] or ]cz', +oo[, Xq — Xq \ {z} and bl(^o ^ 

— either /' has the form ]d; d + 1[, Xq = Aq U {z} and 

• If ?/ ^ Xq, is any total preorder on Xq which coincides with -< on 

Ao \ {x}. 

• If J/ G Ao, then: 

* either ly + c ^ /' and is any total preorder on Xq which 
coincides with ^ on Aq \ {z} 

* either ly + c = I'z, and is any total preorder on Xq which 



coincides with 


^ in Aq \ {z} and verifies 
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From this construction, it is easy to verify that R{c,c)x€x compatible with any 
simple update. 




Example 3. We take the regions described in the 
figure beside. We want to compute the updating 
successors of the region 0 by the update x :> y+2. 
The three updating successors are drawn in the 
figure beside. Their equations are: 

- Region 1: I' =]2; 3[, /' =]0; 1[ and y X x 

- Region 2: I' = [3], ly =]0; 1[ 

- Region 3: I' =]3; +oo[, 7' =]0; 1[ 
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Remark 3. Note that the fact that updates of the form z := z—1 (even used with 
diagonal- free constraints only) lead to undecidability of emptiness (Section 4), 
is not in contradiction with our construction. This is because we can not assume 
that Cz < Cz — 1. 



Case of Local Updates - We will use the semantics of the local updates from 
section 2.3 to compute the updating successors of a region. Assume that R = 
((4) and that up = (upx)xex is a local update over X then R' = 

m xGX, -<') G up{R) if and only if there exists a total preorder on a subset 
oi X U X' (where X' is a disjoint copy of A) verifying 

y z 4=^ y < z for all y, z G X 
y' a" z' 4=^ y z for all y, z G X 

and such that, for any simple update upi appearing in upx, the region Ri = 
{{Ii,x)xex,^i) defined by 

_ ( Ix a X ^ Xi , ■ y Z 4 =^ y ^ z for y, z^Xi 

~ \l'x otherwise ■ Xi z 4=^ x' z for z ^ Xi 

• z <i Xi 4=^ z x\ for z Xi 

belongs to upi{R). 

Assume now that U is a set of updates included in Ui{X). It is then technical, 
but without difficulties, to show that under the following hypothesis: 

- for each simple update y z -I- c which is part of some local update of U, 
condition Cy < Cz + c holds 

the family of regions , Succ) is compatible with U. In fact, the set 

X\J X' and the preorder both encode the original and the updating regions. 
This construction allows us to obtain the desired result for local updates. 

Remark 4- In our definition of Ui{X), we considered restricted set of local up- 
dates. Without such a restriction, it can happen that no such preorder exists. 
For example, let us take the local update x :> y A x :< z and the region R de- 
fined by Ix = [0], ly = Iz =]0; 1[, z < y and y z. Then the preorder a" should 
verify the following : y x' , x' a" z , z a" y and y z, but this leads to a 

contradiction. There is no such problem for the local updates from hi\{X), as we 
only impose to each clock x' to have a value greater than or lower than some 
other clock values. 

For the while, we have only considered updates with integer constants but an 
immediate generalization of Remark 2 allows to treat updates with any rational 
constants. We have therefore proved the following theorem: 

Theorem 4. LetC C Cdf{X) he a set of diagonal- free clock constraints. LetU G_ 
Ui{X) he a set of updates. Let (cx)xex he a family of constants such that for each 
clock constraint y ^ c ofC, condition c < Cy holds and for each update z y-\-c 
ofU, condition Cz <Cy-\-c holds. Then the family of regions (JZ Sue c) is 
compatihle with C and U. 
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Remark 5. Obviously, it is not always the case that there exists a family of 
integer constants such that for each update y z + cofU, condition Cy < Cz + c 
holds. Nevertheless: 

— It is the case when all the constants c appearing in updates y z + c are 
non-negative. 

— In the general case, the existence of such a family is decidable thanks to 
results on systems on linear Diophantine inequations [Dom91]. 



For any couple {C,U) verifying the hypotheses of theorem 4, by applying theo- 
rem 3, the family Aut(C,U) is decidable. Moreover, since we can encode a region 
in polynomial space, testing emptiness is Pspace, and even PsPACE-complete 
(since it is the case for classical timed automata). 



Remark 6. The p-automata used in [BF99] to modelize the ABR protocol can 
be easily transformed into updatable timed automata from a class which fulfills 
the hypotheses of theorem 4. Their emptiness is then decidable. 



7 Considering Arbitrary Updatable Timed Automata 

In this section, we allow arbitrary clock constraints. We thus need to define a 
bit more complicated set of regions. To this purpose we consider for each pair 
y, z of clocks (taken in A C X a finite set of clocks), two constants 
and we define 



Jy,z = {] - oo; dy^z^[} U {[d] I dy^^ <d< d+ J U 

{]d; d+l[\dy.,<d< d+ J U {]d+„; -koo[} 

The region defined by a tuple ((Ix)xex, (Jx,y)x,yex, ^) where 

- Vx € A, Iz: € Tx 

- if Too denotes the set {(y, z) e A^ | ly or Iz is non bounded}, then 
7(^5'^:) G Too, Jy,z ^ 

- ^ is a total preorder on Aq = {x G A | U is an interval of the form ]c, c-|- ![} 
is the following subset of T^: 



r 


Vx G A, v{x) G Ix 


V G 


Vx, y G Ao, it holds 


X < y 4=^ frac(w(x)) < frac(u(j/)) 


[ 


Vy, 2 G Too, v{y) - v(z) G Jy,z 
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In fact, we do not have to keep in mind the values d* * as j/ and z play symmetrical 
roles and is equal to — thus we set dy^z = d+ The set of all regions 
defined in such a way will be denoted by 'R-(cy)^^x,{dy,z)y,z&x- 




Example 4- Assume that we have only two clocks 
X and y and that the maximal constants are Cx = 
3 and Cy = 2, with clocks constraints x—y ~ 0 and 
X — j/ ~ 1. Then, the set of regions associated with 
those constants is described in the figure beside. 
The gray region is defined by Ix =]3; +oo[, ly = 
]2; +oo[ and — 1 < y — x < 0 (z.e. Jy^x is ] — 1; 0[). 



The region Succ(i?) can be defined in a way similar to the one used in the 
diagonal- free case. We also have to notice that this set of regions is compatible 
with the clock constraints we consider. 

Indeed we define the set U 2 {X) of local updates up = {upx)xGX where for each 
clock X, upx is one of the following simple updates: 

x:=c|x:=j/|x:<c|x:<c 




From the undecidability results of Section 4, 
we have to restrict the used updates if we 
want to preserve decidability. For example, if 
we consider the update y := 2/4-1 and the 
regions described in the figure beside, the im- 
ages of the region 1 are the regions 1, 2 and 3. 
But we can not reach region 1 (resp. 2, resp. 
3) from any point of region 1. Thus, this set of 
regions does not seem to be compatible with 
the update y ~ y + 1. 



By constructions similar to the ones of Section 6, we obtain the following theo- 
rem: 



Theorem 5. Let C C C{X) be a set of clock constraints. Let U C U 2 {X) he a 
set of updates. Let (cx)xex and {dy^z)y,zex be families of constants such that 

- for each clock constraint y ^ c of C, condition c < Cy holds, 

- for each clock constraint x — y ^ c, condition c < dx,y holds, 

- for each update y :< c or y :< c or y := c, it holds c < Cy, and for each clock 
z, condition Cz> c + dy^z holds, 

- for each update y := z, condition Cy < Cz holds 

Then the family of regions ^)y compatible with C and U. 

Thus, the class Aut{C,U) is decidable, and as in the previous case, testing 
emptiness of updatable timed automata is PsPACE-complete (unlike the case 
of diagonal-free updates, the previous system of Diophantine equations always 
has a solution) . 
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Example 5. We take the regions we used be- 
fore. We want to compute the updating succes- 
sors of the region 0 by the update x ■.<2. The 
four updating successors are drawn in the fig- 
ure beside. Their equations are: 

- Region 1: = [0] and I'y =]2; -|-oo[ 

- Region 2: /' =]0; 1[, 7' =]2; -|-oo[ 

and =]1; -|-oo[ 

- Region 3: 7^ = [1] and I'y =]2; -|-oo[ 

- Region 4: 7' =]1; 2[, 7' =]2; -|-oo[ 

and =]1; -|-oo[ 

8 Conclusion 

The main results of this paper about the emptiness problem are summarized in 
the following table: 




7/o(X)U--- 


Cdf{T) 


C(X) 


0 


P SPACE 


PSPACE 


{x := c 1 cc € X} U {x := 2 / 1 x, y G X} 


P SPACE 


PSPACE 


{x :< c X G X, c G Q+} 


PSPACE 


PSPACE 


{x := X -1- 1 X G X} 


P SPACE 


Undecidable 


{x :> c X G X, c G Q+} 


P SPACE 


Undecidable 


{x :> 2 /|x, ?/ G X} 


P SPACE 


Undecidable 


{x :< y |x, ?/ G X} 


P SPACE 


Undecidable 


{x y -b c 1 X, y G X, c G Q+} 


P SPACE 


Undecidable 


{x := X — 1 1 X G X} 


Undecidable 


Undecidable 



One of the surprising facts of our study is that the frontier between what is 
decidable and not depends on the diagonal constraints (except for the x \= x—1 
update), whereas it is well-known that diagonal constraints do not increase the 
expressive power of classical timed automata. 

Note that, as mentioned before, the decidable classes are not more powerful than 
classical timed automata in the sense that for any updatable timed automaton 
of such a class, a classical timed automaton (with e— transitions) recognizing the 
same language - and even most often bisimilar - can be effectively constructed 
[BDFPOOb]. However, in most cases an exponential blow-up seems unavoidable. 
This means that transforming updatable timed automata into classical timed 
automata cannot constitute an efficient strategy to solve the emptiness problem. 
In the existing model-checkers, time is represented through data structures like 
DBM (Difference Bounded Matrix) or CDD (Clock Difference Diagrams). An 
interesting and natural question is to study how such structures can be used to 
deal with updatable timed automata. 

Acknowledgements: We thank Beatrice Berard for helpful discussions. 
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Abstract. Bounded Model Checking based on SAT methods has re- 
cently been introduced as a complementary technique to BDD-based 
Symbolic Model Checking. The basic idea is to search for a counter ex- 
ample in executions whose length is bounded by some integer k. The 
BMC problem can be efficiently reduced to a propositional satisfiabi- 
lity problem, and can therefore be solved by SAT methods rather than 
BDDs. SAT procedures are based on general-purpose heuristics that are 
designed for any propositional formula. We show that the unique charac- 
teristics of BMC formulas can be exploited for a variety of optimizations 
in the SAT checking procedure. Experiments with these optimizations on 
real designs proved their efficiency in many of the hard test cases, com- 
paring to both the standard SAT procedure and a BDD-based model 
checker. 



1 Introduction 

The use of SAT methods for Symbolic Model Checking has recently been intro- 
duced in the framework of Bounded Model Checking [4]. The basic idea is to 
search for a counter example in executions whose length is bounded by some 
integer k. The BMC problem can be efficiently reduced to a propositional sa- 
tisfiability problem, and can therefore be solved by SAT methods rather than 
BDDs. SAT procedures do not suffer from the potential space explosion of BDDs 
and can handle propositional satisfiability problems with thousands of variables. 
The first experiments with this idea showed that if k is small enough, or if the 
model has certain characteristics, it outperforms BDD-based techniques [5]. 

SAT procedures are based on general-purpose heuristics that are designed 
for any propositional formula. In this paper we will show that the unique cha- 
racteristics of BMC formulas can be exploited for a variety of optimizations 
in the SAT checking procedure. These optimizations were implemented on top 
of CMU’s BMC [4]^ and the SAT checker Grasp [11,12], without making use of 
features that are unique to either one of them. 

^ We distinguish between the tool BMC and the method BMC. 
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We benchmarked the various optimizations, and also compared them to re- 
sults achieved by RuleBase, IBM’s BDD-based Model Checker [1,2]. RuleBase 
is considered one of the strongest verification tools on the market, and inclu- 
des most of the reductions and BDD optimizations that have been published 
in recent years. The benchmark’s database included 13 randomly selected ’real- 
life’ designs from IBM’s internal benchmark set. Instances trivially solved by 
RuleBase are typically not included in this set, a fact which clearly creates a 
statistical bias in the results. Thus, although we will show that in 10 out of the 13 
cases the improved SAT procedure outperformed RuleBase, we can not conclude 
from this that in general it is a better method. However, we can conclude that 
many of the (BDD-based model checking) hard problems can easily be solved by 
the improved SAT procedure. A practical conclusion is therefore that the best 
strategy would be to run several engines in parallel, and then present the user 
with the fastest result. 

Our results are compatible with [5] in the sense that their experiment also 
showed a clear advantage of SAT when k is small, and when the design has 
specific characteristics that make BDDs inefficient. We found it hard to predict 
which design can easily be solved by BMC, because the results are not strictly 
monotonic in k or the size of the design. We have one design that could not 
be solved with BMC although there was a known bug in cycle 14, and another 
design which was trivially solved, although it included a bug only in cycle 38. 
The SAT instance corresponding to the second design was 5 times larger than 
the first one, in terms of number of variables and clauses. We also found that 
increasing A: in a given design can speed up the search. This can be explained, 
perhaps, by the fact that increasing k can cause an increase in the ratio of 
satisfying to unsatisfying assignments. 

The rest of this paper is organized as follows: in the next two sections we 
describe in more detail the theory and practice of BMC and SAT. In Section 4 
we describe various BMC-specific optimizations that we applied to the SAT pro- 
cedure. In sections 5 and 6 we list our experimental results, and our conclusions 
from them. 



2 BMC - The Tool and the Generated Formulas 

The general structure of an AGp formula, as generated in BMC, is the following: 

k — 1 k 

Lp: IqA /\p{i,i+l) A{\/ ^ P^) (1) 

i =0 1=0 

where Iq is the initial state, p{i,i + 1) is the transition between cycles i and 
i + 1, and Pi is the property in cycle i. Thus, this formula can be satisfied iff 
for some i {i < k) there exists a reachable state in cycle i which contradicts the 
property Pi. Focusing on potential bugs in a specific cycle can be formulated by 
simply restricting the disjunction over Pi to the appropriate cycle. BMC takes an 
SMV - compatible model and generates a propositional SAT instance according 
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to Equation (1). The size of the generated formula is linear in k, and indeed 
empirical results show that k strongly affects the performance. As a second step, 
BMC transforms the formula to CNF. To avoid the potential exponential growth 
of the formula associated with this translation, it adds auxiliary variables, and 
performs various optimizations. 

Every ACTL* formula (the subset of CTL* that contain only universal path 
quantifiers) can be reduced to a SAT instance, under bounded semantics [4]. 
While all safety properties can be expressed in the form of AGp [3], to handle 
temporal operators such as AFp, BMC adds to p the disjunction Vj=o fc-i *)> 
thus capturing the possibility of a loop in the state transition graph. Fairness 
is handled by changing the loop condition to include at least one state which 
preserves the fairness condition. 



3 SAT Checkers and Grasp 



In this section we briefly outline the principles followed by modern propositional 
SAT-checkers, and in particular those that Grasp (Generic seaRch Algorithm 
for the Satisfiability Problem) is based on. Our description follows closely the 
one in [11]. 

Most of the modern SAT-checkers are variations of the well known Davis- 
Putnam procedure [7]. The procedure is based on a backtracking search algo- 
rithm that, at each node in the search tree, chooses an assignment (i.e. both a 
variable and a Boolean value, which determines the next subtree to be traver- 
sed) and prunes subsequent searches by iteratively applying the unit clause rule. 
Iterated application of the unit clause rule is commonly referred to as Boolean 
Constraint Propagation (BCP) . The procedure backtracks once a clause is found 
to be unsatisflable, until either a satisfying assignment is found or the search tree 
is fully explored. The latter case implies that the formula is unsatisflable. 

A more generic description of a SAT algorithm was introduced in [11]. A 
simplified version of this algorithm is shown in Fig. 1. 

At each decision level d in the search, a variable assignment Vd = {T, F} 
is selected with the Decide () function. If all the variables are already decided 
(indicated by ALL-DECIDED), it implies that a satisfying assignment has been 
found, and SAT returns SATISFIABLE. Otherwise, the implied assignments are 
identified with the Deduce () function, which in most cases corresponds to a 
straightforward BCP. If this process terminates with no conflict, the procedure 
is called recursively with a higher decision level. Otherwise, Diagnose () analyzes 
the conflict and decides on the next step. First, it identifies those assignments 
that led to the conflict. Then it checks if the assignment to Vd is one of them. 
If the answer is yes, it implies that the value assigned to Vd should be swapped 
and the deduction process in line I 3 is repeated. If the swapped assignment also 
fails, it means that Vd is not responsible for the conflict. In this case Diagnose () 
will indicate that the procedure should BACK-TRACK to a lower decision level f3 
{(3 is a global variable that can only be changed by Diagnose ()). The procedure 
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// Input arg: Current decision level d 
/ / Return value : 

// SATO: {SATISFIABLE, UNSATISFIABLE} 

// DecideO: {DECISION, ALL-DECIDED} 

// DeduceO: {OK, CONFLICT} 

// Diagnose 0 : {SWAP, BACK-TRACK} 



SAT (d) 



h-. 
I2 ■ 
I3: 
U: 

h '■ 

le : 

h '■ 



} 



if (Decide (d) == ALL-DECIDED) return SATISFIABLE; 
while (TRUE) { 

if (Deduce (d) != CONFLICT) { 

if (SAT (d+1) == SATISFIABLE) return SATISFIABLE; 
else if (/3<d II d == 0) // /3 is calculated in Diagnose () 
{ Erase (d) ; return UNSATISFIABLE; } 

} 

if (Diagnose (d) == BACK-TRACK) return UNSATISFIABLE; 

} 



Fig. 1. Generic backtrack search SAT algorithm 



will then backtrack d — j3 times, each time Erase ()-ing the current decision and 
its implied assignments, in line Iq. 

Different SAT procedures can be modeled by this generic algorithm. For 
example, the Davis-Putnam procedure can be emulated with the above algorithm 
by implementing BCP and the pure literal rule in deduce (), and implementing 
chronological backtracking (i.e. P = d—1) in diagnose () . Modern SAT checkers 
include Non- chronological Backtracking search strategies (i.e. /3 = d — j,j > 
1). Hence, irrelevant assignments can be skipped over during the search. The 
analysis of conflicts can also be used for adding new constraints (called conflict 
clauses) on the search. These constraints prevent the repetition of assignments 
that lead to conflicts. This way the search procedure backtracks immediately if 
a ’bad’ assignment is repeated. For example, if Diagnose () concludes that the 
assignment x = T,y = F, z = F inevitably leads to a conflict, it adds the conflict 
clause 7T = (~ X V y V z) to (/?. 

From the large number of decide () strategies suggested over the years, ex- 
periments with Grasp have demonstrated that the Dynamic Largest Individual 
Sum (DLIS) has the best average results [10]. DLIS is a rather straightforward 
strategy: it chooses an assignment that leads to the largest number of satisfied 
clauses. In this research we only experimented with DLIS, although different 
problem domains may be most efficiently solved with different strategies. 
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4 Satisfiability Checking of BMC Formulas 

In this section we describe various BMC-specific optimizations that have been 
implemented on top of Grasp. Many of the optimizations deal with familiar 
issues that are typically associated with BDDs: variable ordering, direction of 
traversal (backward Vs. forward), first subtree to traverse, etc. 



4.1 Constraints Replication 

The almost symmetric structure of Equation (1) can be used for pruning the 
search tree when verifying AGp formulas. In the following discussion let us first 
ignore /q, and assume that ip is fully symmetric. 

Conflict clauses, as explained in Section 3, are used for pruning the search 
tree by disallowing a conflicting sequence (i.e. an assignment that leads to an un- 
satisfied clause) to be assigned more than once. We will use the alleged symmetry 
in order to add replicated clauses, which are new clauses that are symmetric to 
the original conflict clause. Each of these clauses can be seen as a constraint on 
the state-space which, on the one hand preserves the satisfiability of the formula 
and, on the other hand, prunes the search tree. 

Let us illustrate this concept by an example. Suppose that deduce () con- 
cluded that the assignment X 4 = T,yr = F, Z 5 = F always leads to a conflict 
(the subscript number in our notation is the cycle index that the variable refers 
to). In this case it will add the conflict clause tt = (~ a ;4 V 2/7 V Z 5 ) to (p. We 
claim that the symmetry of Equation (1) implies that, for example, the assign- 
ment X 3 = T,yQ = F, Z 4 = F will also lead to a conflict, and we can therefore 
add the replicated clause tt = X3V y^V Z4) to (p. Let us now generalize this 
analysis. Let S be the difference between the largest and lowest index of the 
variables in tt (in our case <5 = 7 — 4 = 3) . For all 0 < t < fc — 5, the assignment 
Xi = T, t/j +3 = F, Zi+i = F will also result in a conflict and we can therefore add 
the replicated clause = (~ Xi V yi +3 V 2 i+i). 



Yet, if is not fully symmetric, (p is not fully symmetric because of Iq and 
because of the Bounded Cone of Influence reduction [5] The BCOI reduction 
eliminates variables that are not affecting the property up to cycle k. It can eli- 
minate, for example, Xi for k — 3<i<k and yj for k — 5 < j < k. Consequently 
cycle k — 5 will not be symmetric anymore to cycle k — 5 in ip. Typically variables 
are eliminated only from the right hand side, i.e., if a variable Xk is not elimina- 
ted, than for all i < k, Xi is also not eliminated. In the following discussion we 
concentrate on this typical case. Minor adjustments are needed for the general 
case. 

There are two options to handle the asymmetry caused by the BCOI reduc- 
tion. One option is to restrict the replicated clauses toO<i<k — 6 — A, where 

^ This is in addition to several other manipulations that BMC performs on ip which are 
easy to overcome, and will not be listed here. 
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A is the number of cycles affected by the BCOI reduction. Another option is to 
add replicated clauses as long as all their variables are contained in the BCOI. 

The second option can be formalized as follows. Let C be the set of variables 
in the conflict clause. For a variable a G C, denote hy ka < k the highest index 
s.t. is a variable in (f (without the BCOI reduction, k„ = k for all variables) 
and by ia- the index of a in C. Also, let mine = min{ia-} and ip = min{{ka- — io-)} 
for all a G C. Intuitively, ^jJ is the maximum number of clauses we can add to 
the ’right’ (i.e. with a higher index) of the conflict clause. We now add replicated 
clauses s.t. the variable a for which to- = mine ranges from 0 to mine + V’- 

Example 1. For the conflict clause tt = (~ X 4 V ?/7 V Z5), we have C = {X 4 , 7/7, z^} 
and mine = 4. Suppose that k^ = 5,ky = 10 and k^ = 7. Also, suppose that 
k = 10 and Z\ = 5 (since kx = 5, A has to be greater or equal to (10 — 5) = 5). 
According to the first option, x will range from 0 to (10 — 5 — (7 — 4)) = 2. Thus, 
the replicated clauses will be (~ xq V j/3 V 2i)...(~ X2 V j/5 V 23). According to 
the second option, we calculate ip = min{{5 — 4), (10 — 7), (7 — 5)) = 1, and 
therefore x will range from 0 to (4 + 1) = 5. Thus, this time the right most 
clause will be (~ a;s V yg V Zq). □ 

Example 1 demonstrates that the second option allows for more replicated clau- 
ses to be added, and is therefore preferable. 

The influence of Iq is not bounded, and can propagate up to cycle k. Therefore 
a simple restriction on the replicated clauses is insufficient. A somewhat ’brute- 
force’ solution is to simulate an assignment for every potential replicated clause, 
(i.e. assign values that satisfy the complement of mi) and check if it leads to 
a conflict. The overhead of this option is rather small, since it only requires to 
assign \mi\ variables and then deduceO once. If this results in a conflict, we can 
add mi to the formula. However, the addition of wrong clauses can only lead to 
false positives, and therefore we can skip the simulation and refer to constraint 
replication as an under approximation method (this also implies that for the 
purpose of faster falsification, many other under approximation heuristics can 
be implemented by adding clauses to ip) . Hence, we can first skip the simulation, 
and only if the formula is unsatisfiable, run it again with simulation. 

The overhead of adding and simulating the replicated clauses is small in 
comparison to its benefit. In all the test cases we examined, as will be shown in 
Section 5, the replicated clauses accelerated the search, although not dramati- 
cally. 

4.2 Static Ordering 

The variable ordering followed by dynamic decide () procedures (such as the 
previously mentioned DLIS strategy) is constructed according to various ’greedy’ 
criteria, which do not utilize our knowledge of (^’s structure. A typical scenario 
when using these procedures, in the context of BMC formulas, is that large sets of 
clauses associated with distant cycles are being satisfied independently, until they 
’collide’, i.e. it is discovered that the assignments that satisfied them contradict 
each other. Fig. 2 demonstrates this scenario, by showing two distant sets of 
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assigned variables (around the 5*^ and 20*^ cycles), that grow independently 
until at some point they collide. Similarly, they can collide with the constraints 
imposed by the initial state Iq or the negation of the property in cycle k. To 
resolve this conflict, it may be necessary to go back hundreds of variables up 
the decision tree. We claim that this phenomena can potentially be avoided by 
guiding the search according to the (fc-unfolding of the) Variable Dependency 
Graph (VDG). This way conflicts will be resolved on a more ’local’ level, and 
consequently less time will be wasted in backtracking. 



Iq conflict -iPfc 




V5 V20 



Fig. 2. With default dynamic ordering strategies, it is common that distant sets of 
variables are assigned values independently. We refer the reader to a technical report 
[9], where we show snapshots of the number of variables from each cycle that are 
assigned a value at a given moment. These charts prove that this phenomena indeed 
occur when using these strategies. 



The most natural way to implement such a strategy is to predetermine a static 
order, following either a forward or a backward Breadth - First Search (BFS) 
on VDG. Indeed, our experiments have shown that in most cases this strategy 
speeds up the search. 



Ordering strategies. We now investigate variations of the BFS strategy. Let 
us first assume that we are looking for a counter example in a particular cycle k. 
In this case a strict backward traversal may spend a significant amount of time 
in paths which include unreachable states. This fact will be revealed only when 
the search reaches Iq (we denote the set of variables in a sub- formula ip by ip ), 
which is placed last in the suggested order. Enforcing a static forward traversal, 
on the other hand, may result in a prolonged search through legal paths (i.e. 
paths that preserve the property), that will be revealed only when is decided 
(these are the two ’walls’ in Fig. 2). A similar dilemma is associated with BDD- 
based techniques (see for example [6] and [8]). It seems that the (unknown) ratio 
between the number of paths that go through unreachable states and the number 
of legal paths is crucial for determining the most efficient direction of traversal 
in both methodologies. 

The strict backward or forward BFS causes the constraints, either on the 
first or the fc-th cycle, to be considered only in a very ’deep’ decision level, and 
the number of backtracks will consequently be very high, sometimes higher than 
the default dynamic strategies. Another problem with straight BFS results from 
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the very large number of variables in each cycle. Typically there are hundreds 
or even thousands of variables in each cycle. It creates a large gap between each 
variable and its immediate neighbors in VDG, and therefore conflicts are not 
resolved as locally as we would like to. 

These two observations indicate that the straightforward BFS solution should 
be altered. On the one hand, we should keep a small distance between Pq smd 
Pk, and on the other hand we should follow VDG as close as possible. This 
strategy can be achieved, for example, by triggering the BFS with a set S of 
small number of variables from each cycle. As a minimum, it has to include Pk 
(otherwise not all the variables will be covered by the search) . Different strategies 
can be applied for choosing the variables from the other cycles. For example, we 
can choose Pi for all 

When we generalize our analysis and assume that we are looking for a counter 
example in the range 0..fc, the set S := Uo<i<fe smallest initial set which 

enables the BFS procedure to cover the full set of variables in a single path. Initial 
sets smaller than S will require more than one path. This will split the set of 
variables of each cycle into a larger number of small sets, and consequently create 
a big gap between them (i.e. between each node and its siblings on the graph). 
If two such distant siblings are assigned values which together contradict their 
parent node, then the backtrack ’jump’ will be large. Increasing S, on the other 
hand, will create a large gap between neighboring variables on VDG (i.e. between 
a node and its sons on the graph). This tradeoff indicates that a single optimal 
heuristic for all designs probably does not exist, and that only experiments can 
help us to fine-tune S. 

There are, of course, numerous other possible ordering strategies. Like BDDs, 
on the one hand it has a crucial influence on the procedure efficiency, and on the 
other hand, an ordering heuristic which is optimal for all designs is hard to find. 



Unsatisfiable instances. A major consideration in designing SAT solvers, 
is their efficiency in solving unsatisfiable instances. Although the various opti- 
mizations (e.g. conflicting clauses, non-chronological backtracking) are helpful 
in these cases as much as they are with satisfiable instances, while satisfiable 
instances can be solved fast by a good ’guess’ of assignments, an instance can be 
proven to be unsatisfiable only after an exhaustive exploration of the state-space. 

We now show that the order imposed by the previously suggested backward 
BFS is particularly good for unsatisfiable BMG-formulas. In the following dis- 
cussion we denote ip's sub-formulas Vi=o + 1) by P and 

p respectively. 

Let us assume that the property holds up to cycle /c, and consequently ip 
is unsatisfiable. Since the transition relation p is consistent a contradiction 
in ip will not be found before the first variables from P are decided. Yet, since 

® This is not always possible because for i < k, Pi might be removed by the BCOI 
reduction. 

Inconsistent transition relations can occur, but typically can also be trivially detec- 
ted. 
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typically |P| <C |p|, it is possible that the search will backtrack on p’s variables 
for a very long time before it reaches P. Thus, by forcing the search to begin 
with P, we may be able to avoid this scenario. However, starting from P is not 
necessarily enough, because this way we only shift the problem to the variables 
that define P. It is clear that a BFS backwards on the dependency graph, from 
the property variables to the initial state is a generalization of this idea and 
should therefore speed up the proof of unsatisfiability. 

4.3 Choosing the Next Branch in the Search Tree 

The proposed static ordering does not specify the Boolean value given to each 
variable. This is in contrast to the dynamic approach where this decision is 
implicit. Here are four heuristics that we examined: 

1. Dynamic decision. The value is chosen according to one of the dynamic 
Decide 0 strategies, which are originally meant for deciding both on the 
variable and its value. For example, the DLIS strategy chooses the value 
that satisfies the largest number of clauses. 

2. Constant, or random decision. The most primitive decision strategy is to 
constantly assign either ’0’ or ’1’ to the chosen variable, or alternatively, to 
choose this value randomly. As several experiments have shown in the past 
[10], choosing a random or a constant value is not apriori inferior to dynamic 
decision strategies as one might expect. Any dynamic decision strategy can 
lead to the ’wrong side of the tree’, i.e. can cause the search to focus on 
an unsatisfiable sub-tree. Apparently constant or random decisions in many 
cases avoid this path and consequently speed up the search. 

3. Searching for a flat counter example. Analysis of bugs in real designs, leads 
to the observation that most of them can be reached by computations which 
are mostly ’fiat’, i.e. computations where the frequency in which the majority 
of the variables swap their values is low. This phenomenon can be exploited 
when ’guessing’ the next subtree to be traversed. Suppose that the Decide () 
function chose to assign a variable Xi for some 0 < i < k. Let xi and Xr be 
the left and right closest neighboring variables of Xi that are already assigned 
a value at this point (if no such variable exists, we will say that xi, or Xr, 
is equal to _L). To construct a fiat counter example, if Xi = x^. we will 
assign Xi their common value. The following simple procedure generalizes 
this principle: 

1= largest number s.t. l<i and xi is assigned. 

r= smallest number s.t. r>i and Xr is assigned. 

if xi yfT 

if (xi = Xr II Xr =-L) return x; ; else return {T,F}; 

else 

if (Xr yf-L) return Xr', else return {T,F}; 



The non-deterministic choice can be replaced by one of the heuristics that 
were suggested above (e.g. dynamic, constant). 
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4. Repeating previous assignments. When the search engine backtracks from 
decision level d to (3, all the assignments that are either decide ()d or 
deduce ()d between these two levels are undone by erase (). We claim that 
repeating previous assignments can reduce the number of backtracks. This 
is because we know that all assignments between levels /3 + 1 and d do not 
contradict one another nor do they contradict the assignments with decision 
level lower than (3 (otherwise the procedure would backtrack before level d) . 
In order to decide on each variable’s value for the first time, this strategy 
should be combined with one of the strategies that were described before. 

4.4 A Combined Dynamic and Static Variable Ordering 

The static ordering can be combined in various ways with the more traditional 
dynamic procedures. We have implemented two such strategies: 

1. Two phase ordering. The static traversal is used for the first maxs variables, 
and then the variables are Decide ()-d dynamically. 

2. Sliding window. Variables are chosen dynamically from a small set of varia- 
bles, corresponding to a ’window’ which progresses along the static order that 
we chose. Let V : vi..Vn be the static variable ordering, and let V : 

be the (ordered) subset of V’s variables that are currently not assigned a 
value. Let 1 < w < fc be an arbitrary number denoting the window size. In 
each step, a variable is dynamically chosen from the set of variables that are 
within the borders of the window — u(„]. Note that the two extreme ends 
of w, namely m = 1 and w = k, correspond to the pure static and dynamic 
orderings, respectively. 

4.5 Restricting Decide () to Dominating Variables 

While (fi typically contains tens of thousands of variables, not more than 10%- 
20% of them are the actual model’s variables. The other 80% are auxiliary va- 
riables that were added to (p in order to generate a compact CNF formula. It is 
clear that the model’s variables are sufficient for deciding the satisfiability of the 
formula, and therefore it should be enough to decide () only them (however, 
if the formula has more than one satisfying assignment, some of the auxiliary 
variables should be assigned too) . The same argument can be applied to a much 
smaller set of variables: the inputs. The input variables are typically less than 
5% of the total number of variables, and can determine alone the satisfiability 
of the formula®. Thus, if we restrict DecideO to one of these small sets, we po- 
tentially reduce the depth of the decision tree, on the expense of more deduce () 
operations. 

® Here we assume that all non-deterministic assignments are replaced by conditional 
assignments, where the ’guard’ of the condition is a new input variable. 
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5 Experimental Results 

The Benchmark included 13 designs, on a ’one property per design’ base. The 
properties were proven in the past to be false, and the cycle in which they 
fail was known as well. Thus, the Benchmark focuses on a narrow view of the 
problem: the time it takes to find a bug in cycle k, when k is pre-known. The 
iterative process of finding k is, of course, time consuming, which might be more 
significant than any small time gap between BMC and regular model checking. 

The results presented in Fig. 3 summarize some of the more interesting con- 
figurations which we experimented with. In Fig. 4 we present more information 
regarding the SAT instance of each case study (the no. of variables and clauses) 
as well as some other Grasp configurations which were generally less successful. 
The right-most column in this figure includes the time it takes to prove that there 
is no bug up to cycle k — 1, with the SM configuration. These figures are im- 
portant for evaluating the potential performance differences between satisfiable 
and unsatisfiable instances. 

We present results achieved by RuleBase under two different configurations. 
RBI is the default configuration, with dynamic reordering. RB2 is the same 
configuration without reordering, but the initial order is taken from the order 
that was calculated with RBI. These two configurations represent a typical 
scenario of Model-Checking with RuleBase. Each time reordering is activated, 
the initial order is potentially improved and saved in a special order file for future 
runs. Thus, RB2 results can be further improved. 

RuleBase results are compared with various configurations of Grasp, where 
the first one is simply the default configuration without any of the suggested 
optimizations. 

The following table summarizes the various configurations, where the left 
part refers to Fig. 3 and the right part to Fig. 4: 





Grasp 


+R 


+SM 


-l-SMF 


-l-SMR 


-l-SMP 


-l-SMD 


+w, 


Ordering: 


Dyn 


Dyn 


Stat 


Stat 


Stat 


Stat 


Stat 


Win i 


Value: 


Dyn 


Dyn 


1 


Flat 


1 


Prev 


Dyn 


Dyn 


Variable set: 


All 


All 


Model 


Model 


Model 


Model 


Model 


Model 


Replication: 


No 


Yes 


No 


No 


Yes 


No 


No 


No 



The Stat ordering refers to the static order suggested in Section 4.2, whereas 
Dyn is the default dynamic decision strategy adopted by Grasp (DLIS). The 
Win i refers to a combined dynamic and static ordering, where variables within 
a window of size i are selected dynamically, as explained in Section 4.4. The ’1’, 
Flat and Prev default values refer to the constant, flat and previous values sug- 
gested in Section 4.3 (in -|-SMP we combined the Prev strategy with the default 
value ’1’). The Model variable set refers to a restriction on decide () to model 
variables only, as described in Section 4.5. The Replication refers to constraint 
replication and simulation, as explained in Section 4.1. All configurations include 
the flag ’-|-g60’, which restricts the size of the conflict clauses (and consequently 
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also the size of the replicated clauses) to 60 literals. Other than very few cases, 
all the other possible configurations did not perform better than those that are 
presented. 

The test cases in the figure below are separated into two sets: the 10 designs 
in the first set demonstrate better results for the optimized SAT procedure, and 
the 3 designs in the second set demonstrate an advantage to the BDD-based 
procedure. Both sets are sorted according to the RBI results. 



Design # 


K 


RBI 


RB2 


Grasp 


+R 


+SM 


+SMF 


+SMP 


+SMR 


1 


18 


7 


6 


282 


115 


3 


57 


29 


4.1 


2 


5 


70 


8 


1.1 


1.1 


0.8 


1.1 


0.7 


0.9 


3 


14 


597 


375 


76 


52 


3 


2069 


3 


3 


4 


24 


690 


261 


510 


225 


12 


27 


12 


12 


5 


12 


803 


184 


24 


24 


2 


2 


2 


3 


6 


22 


* 


356 


* 


* 


18 


16 


38 


18 


7 


9 


* 


2671 


10 


10 


2 


1.8 


1.9 


2 


8 


35 


* 


* 


6317 


2870 


20 


338 


101 


74 


9 


38 


* 


* 


9035 


* 


25 


277 


126 


96 


10 


31 


* 


* 


* 


9910 


312 


22 


64 


330 


11 


32 


152 


60 


* 


* 


* 


* 


* 


* 


12 


31 


1419 


1126 


* 


* 


* 


* 


* 


* 


13 


14 


* 


3626 


* 


* 


* 


* 


* 


* 



Fig. 3. Results table (Sec.). Best results are bold-faced. Asterisks (*) represent run 
times exceeding 10,000 sec. 



Remarks for Figures 3 and 4 

1. The time required by BMC to generate the formula is not included in the re- 
sults. BMC generates the formula typically in one or two minutes for the large 
models, and several seconds for the small ones. While generating the formula, 
the improved BMC generate several files which are needed for performing the 
various optimizations. 

2. RuleBase supports multiple engines. The presented results were achieved 
by the ’classic’ SMV-based engine. Yet, a new BDD-based engine that was 
recently added to RuleBase (January 2000), performs significantly better on 
some of these designs. This engine is based on sophisticated under and over 
approximation methods that were not yet published. 

3. When comparing RuleBase results to BMC results, one should remember 
that the former has undergone years of development and optimizations, 
which the latter did not yet enjoy. The various optimizations that have 
been presented in this paper can be further improved and tuned. Various 
combinations of the dynamic and static orderings are possible, and it is ex- 
pected that more industrial experience will help in fine tuning them. The 
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Design # 


vars 


clauses 


+SM 


+SMD 


+W50 


+W100 


+W200 


+SM (fe- 1) 


1 


9685 


55870 


3 


36 


46 


46 


51 


20 


2 


3628 


14468 


0.8 


0.7 


0.7 


0.7 


0.8 


0.4 


3 


14930 


72106 


3 


1216 


8 


3 


17 


934 


4 


28161 


139716 


12 


26 


31 


42 


61 


26 


5 


9396 


41207 


2 


3 


2 


3 


3 


1 


6 


51654 


368367 


18 


243 


111 


418 


950 


28 


7 


8710 


39774 


2 


1.8 


2.5 


1.9 


2.8 


1.3 


8 


58074 


294821 


20 


123 


163 


86 


105 


30 


9 


63624 


326999 


25 


136 


164 


153 


181 


230 


10 


61088 


334861 


312 


125 


70 


107 


223 


1061 


11 


32109 


150027 


* 


* 


* 


* 


* 


* 


12 


39598 


19477 


* 


* 


* 


* 


* 


* 


13 


13215 


6572 


* 


* 


* 


* 


* 


* 



Fig. 4. Other, less successful configurations 



implementation of the SAT checker Grasp can also be much improved even 
without changing the search strategy. It was observed by [10] that an efficient 
implementation can be more significant than the decision strategy.® 

6 Conclusions 

1. Neither BDD techniques nor SAT techniques are dominant. Yet, in most 
(10 out of 13) cases the optimized SAT procedure performs significantly 
better. As was stated before, only significant differences in performance are 
meaningful, because normally k is not pre-known. Such differences exist in 
8 of the 10 cases. 

2. The SM, SMP and SMR strategies are better in all cases compared to the 
default procedure adopted by Grasp. The SM strategy seems to be the best 
one. 

3. The static ordering apparently has a stronger impact on the results than the 
strategy for choosing the next subtree. This can be explained by the fact that 
wrong choices of values are corrected ’locally’ when the variable ordering 
follows the dependency graph, as was explained before. Surprisingly, the 
constant decision ’TRUE’, which is the most primitive strategy, proved to be 
the most efficient (in another experiment we tried to solve design #10, which 
is the only one that is solved significantly better by other configurations, 
with a constant decision ’FALSE’. It was solved in about 3 seconds, faster 
than all other configurations). The ’flat’ decision strategy performed better 
only in three cases. The ’Prev’ decision was better than ’flat’ in 6 designs, 
but only once better than the simple constant decision. Yet, it seems to be 

® In [4], SATO [13] was used rather than Grasp. Although in some cases it is faster 
than Grasp, it is restricted in the number of variables it can handle, and seems to 
be less stable. 
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more stable in achieving fast results than both of them. As for the sliding 
window strategy, Fig. 4 shows that in most cases increasing the window size 
only slows down the search. The surprising success of the constant decision 
strategy can perhaps be attributed to its zero overhead. It can also indicate 
that most bugs in hardware designs can be revealed when the majority of 
the signals are ’on’. Only further experiments can clarify if this is a general 
pattern or an attribute of the specific designs that were examined in the 
benchmark. 

4. Constraint replication (+simulation) requires a small overhead, which does 
not seem to be worthwhile when used in combination with static ordering. 
Yet, it speeds up the standard search based on dynamic ordering. 

This can be explained by the inherent difference between dynamic and static 
orderings: suppose that the assignment x\ = T and j /20 = F leads to a 
conflict, and suppose that their associated decision levels were 10 and 110 
respectively when the conflict clause (-ia;i V j/ 20 ) was added to ip. In static 
ordering, the decision level for each variable remains constant. As a result, 
even if the search backtracks to a decision level lower than 10, the conflict 
clause will not be effective until the search once again arrives at decision 
level 110. In dynamic ordering, on the other hand, there is a chance that 
these two variables will be decided much closer to each other, and therefore 
the clause will prune the search tree earlier. 

Another reason for the difference is related to the typical sizes of backtracking 
in each of the methods. Since conflicts are resolved on a more ’local’ level 
in the SM strategy, conflict clauses (either the original ones or the replica- 
ted clauses) are made of variables which are relatively close to each other 
in terms of their associated decision level. Therefore the non-chronological 
backtracking ’jump’ caused by these clauses is relatively small. 

5. Both SAT methods and BDD based methods do not have a single dominant 
configuration. BDDs can run with or without reordering, with or without 
conjunctive partitioning, etc. As for SAT methods, all the optimizations de- 
scribed in Section 4 can be activated separately, and indeed, as the results 
table demonstrate, different designs are solved better with different con- 
figurations. Given this state of affairs, the most efficient solution, as was 
mentioned in the introduction, would be to run several engines in parallel 
and present the user with the fastest solution. This architecture will not only 
enable the users to run SAT and BDD based tools in parallel, but also to 
run these tools under different configurations in the same time, which will 
obviously speed up the process of model checking. 
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Abstract. Net unfoldings have attracted much attention as a powerful 
technique for combating state space explosion in model checking. The 
method has been applied to verification of 1-safe (hnite) Petri nets, and 
more recently also to other classes of finite-state systems such as synchro- 
nous products of finite transition systems. We show how unfoldings can 
be extended to the context of infinite-state systems. More precisely, we 
apply unfoldings to get an efficient symbolic algorithm for checking sa- 
fety properties of unbounded Petri nets. We demonstrate the advantages 
of our method by a number of experimental results. 



1 Introduction 

Model Checking has had a great impact as an efficient method for algorithmic 
verification of finite-state systems. A limiting factor in its application is the state 
space explosion problem, which occurs since the number of states grows expo- 
nentially with the number of components inside the system. Therefore, much 
effort has been spent on developing techniques for reducing the effect of state 
space explosion in practical applications. One such a technique is that of partial 
orders which is based on the observation that not all interleavings of a given 
set of independent actions need to be explored during model checking. Several 
criteria for independency has been given, e.g., stubborn sets [Val90], persistent 
sets [GW93] or ample sets [Pel93]. A method which has drawn considerable 
attention recently is that of unfoldings [McM95,ERV96,ER99]. Unfoldings are 
occurrence nets: unrollings of Petri nets that preserve their semantics. Although 
unfoldings are usually infinite, it is observed in [McM95] that we can always con- 
struct a finite initial prefix of the unfolding which captures its entire behaviour, 
and which in many cases is much smaller than the state space of the system. 
Unfoldings have been applied to n-safe (i.e., finite-state) Petri nets, and more 
recently to other classes of finite-state systems such as synchronous products of 
finite transition systems [LB99,ER99] 
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In a parallel development, there has been numerous efforts, to extend the 
applicability of model checking to the domain of infinite-state systems. This has 
resulted in several highly nontrivial algorithms for verification of timed auto- 
mata, lossy channel systems, (unbounded) Petri nets, broadcast protocols, rela- 
tional automata, parametrized systems, etc. These methods operate on symbolic 
representations, called constraints each of which may represent an infinite set of 
states. However, in a manner similar to finite-state verification, many of these 
algorithms suffer from a constraint explosion problem limiting their efficiency in 
practical applications. As the interest in the area of infinite-state systems in- 
creases, it will be important to design tools which limit the impact of constraint 
explosion. With this in mind, we have considered [AKP97,AJKP98] a refine- 
ment of the ample set construction and applied it to infinite-state systems such 
as Petri nets and lossy channel systems. 

In this paper, we show how the unfolding technique can be made to work 
in the context of infinite state systems. More precisely, we present an unfolding 
algorithm for symbolic verification of unbounded Petri nets. We adapt an al- 
gorithm described in [ACJYK96] for backward reachability analysis which can 
be used to verify general classes of safety properties. Instead of working on in- 
dividual markings (configurations) of the net (as is the case with the previous 
approaches [McM95,ERV96,ER99,LB99]) we let our unfolding algorithm ope- 
rate on constraints each of which may represent an (infinite) upward closed set 
of markings. We start from a constraint describing a set of “final” markings, 
typically undesirable configurations which we do not want to occur during the 
execution of the net. ^From the set of final markings we unroll the net backwards, 
generating a Reverse Occurrence Net (RON). In order to achieve termination we 
present an algorithm to compute a postfix of the RON, which gives a complete 
characterization of the set of markings from which we can reach a final marking. 
Using concepts from the theory of well quasi-orderings we show that the postfix 
is always finite. In fact, our method offers the same advantages over the algo- 
rithm in [ACJYK96], as those offered by the algorithms of [McM95,ERV96] in 
the context of finite-state systems. 

Based on the algorithm, we have implemented a prototype, whose results on 
a number of simple examples are encouraging. 

Outline In the next section we give some preliminaries on Petri nets. In Sec- 
tion 3 we introduce Reverse Occurrence Nets (RONs). In Section 4 we describe 
the unfolding algorithm. In Section 5 we describe how to compute a finite postfix 
of the unfolding. In Section 6 we report some experimental results. Finally, in 
Section 7 we give some conclusions and directions for future research. 



2 Preliminaries 

Let N be the set of natural of numbers. For a, 6 G N, we define a © 6 to be equal 
to a — 6 if a > 6, and equal to 0 otherwise. A bag over a set A is a mapping from 
A to N. Relations and operations on bags such as <, -b, — , 0, etc, are defined as 
usual. Sometimes, we write bags as tuples, so (a, b, a) represents a bag B with 
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B{a) = 2, B{b) = 1, and B{d) = 0 if d ^ a, b. A set A of bags is said to be 
upward closed if B\ & A and B\ < B2 imply B2 € A. The upward closure of a 
bag A is the set {A' \ A < A'}. We may interpret a set S' as a bag with S(o) = 1 
if a G S and S(o) = 0 if o ^ S. We use |S| to denote the size of the set S. In 
this paper, we shall use the terminology of [ERV96] as much as possible. 

A net is a triple (S, T, F) where S is a finite set of places, T is a finite set of 
transitions, and F C (^S x T) U (T x S) is the flow relation. By a node we mean 
a place or a transition. The preset *x of a node x is the set {y \ {y, x) G F}. The 
postset X* is similarly defined. A marking M is a bag over S. We say that a 
transition t is enabled in a marking M if *t < M . We define a transition relation 
on the set of markings, where Mi — ^ M2 if there is t G T which is enabled in 
Ml and M2 = Mi — *t + t*. We let denote the reflexive transitive closure of 
— We say that a marking M2 is coverable from a marking Mi if Mi — >■ M2, 
for some M^ > M2- A net system is a tuple N = {S,T, F, Minit, Mfin), where 
(S,T,F) is a net and Minu, Mfin are markings, called the initial and the final 
marking of N respectively. In this paper, we consider the coverability problem 
defined as follows. 

Instance A net system [S, T,F, Mina, Mfin). 

Question Is Mfin coverable from Minifl 

Using standard methods [VW86,GW93], we can reduce the problem of 
checking safety properties for Petri nets to the coverability problem. 

To solve the coverability problem, we perform a backward reachability analy- 
sis. We define a backward transition relation [ACJYK96], such that, for markings 
Ml and M2 and a transition t, we have M2 Mi if Mi = {M2 © t*) -I- *t. We 
let UtgT and let M M' denote that M = Mq Mi • Mk = M', 
for markings Mq, . . . ,M^. We define to be the reflexive transitive closure of 
Observe that, for each marking M2 and transition t, there is a marking Mi 
with M2 '^t Ml, i.e., transitions are always enabled with respect to The 
following lemma relates the forward and backward transition relations. 

Lemma 1. 

L If Ml — > M2 and M^ < M2 then there is M[ < Mi such that M^ M[. 

2. If M2 Ml and M[ > Mi then there is M2 such that M2 > M2 and 
M'l — ^ M^. 

3 Reverse Occurrence Nets 

In this section we introduce Reverse Occurrence Nets (RONs). A RON corre- 
sponds to “unrolling” a net backwards. Formally, a RON i? is a net (C, E, F) 
satisfying the following three conditions 

(i) |c*| < 1 for each c G C. 

(ii) there is no infinite sequence of the form ciFeiFc2F • • • . This condition implies 
that there are no cycles in the RON, and that there is a set max(F) of nodes 
which are maximal with respect to F. 

(iii) max(F) C C. 




498 



P.A. Abdulla, S.P. Iyer, and A. Nylen 



In a RON, the places and transitions are usually called conditions and events 
respectively. A set of events if C E is considered to be a configuration if e G if 
and eF*e' imply e' € E. 

Remark 1 . In [McM 95 ,ERV 96 ], a configuration E is upward closed in the sense 
that if an event e belongs to if, then all events above e (with respect to F) also 
belong to E. In our case, configurations are downward closed. Furthermore, in 
[McM 95 ,ERV 96 ], configurations are required to be conflict free, i.e., for all events 
Cl, 62 G if we have *ei 0*62 = 0 . Notice that this property is always satisfied by 
our configurations, since we demand that |c*| < 1 for each condition. 

Consider a net system N = (S', T, F, Minu, Mfin) and a RON (C, E, F), and let 
/x:CUE— >-SUT such that /i(c) G S if c G C and fj.{e) G T if e G E. For C C C, 
we define #C to be a marking such that, for each place s, the value of flC{s) 
is equal to the size of the set {c G C | fi{c) = s}. In other words flC{s) is the 
number of conditions in C labeled with s. We say that (C, E, F, fl) is a (backward) 
unfolding of N if the following two conditions are satisfied: (i) #max(F) = Mfin, 
i.e., the set of conditions which are maximal with respect to F correspond to the 
final marking; and (ii) /i preserves F, viz., if (x,y) G F then (g,{x) , fj,{y)) G F. 

For a configuration E, we define Cut{E) to be the set 

({*e I e G if} U max(F)) — {e* | e G if} 

We define the marking mark{E) = )f{Cut{E)). 

In Figure I, we show a net system N with seven places, si, . . . , S7, and four 
transitions, ti, . . . , 1 ^. We also show an unfolding^ U of N, assuming a final mar- 
king (si, S7). Examples of configurations in U are E\ = {c2, 64} with mark(Ei) = 
(si, 52,53). and E2 = {61,62,63,64} with mark{E2) = {si,S2,S2,sfl). 




Fig. 1. A net system and one of its unfoldings 



^ To increase readability, we show both names and labels of events in the figure, while 
we omit name of conditions. 
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4 An Unfolding Algorithm 

We present an algorithm (Figure 2) which, for a given net system 

generates an unfolding of N in an incremental manner. In a manner similar 
to [ERV96], an unfolding U = (C,E, F,^) is represented as a list of objects 
corresponding to conditions and events in the underlying RON. An event e is 
represented as an object (C, t) where t is the label /r(e) of e and C is its set e* of 
post-conditions. A condition c is represented by an object (e, s) where s is the 
label ^(c) of c, and e is its (single) post-event c*. We observe the flow relation F 
and the labeling function /i are included in the encoding. 

Consider a set of conditions C of C/ to be t-enabled provided there exists 
a configuration E such that C C Cut{E) and 0 < < t*, i.e., there is a 

configuration E such that C C Cut{E) and all the conditions in C are in the 
postset of t. Furthermore, consider C to be maximally t-enabled provided there 
is no other set C such that C C C' C Cut{E) and C is t-enabled. We will write 
MEt(C) to denote that C is maximally t-enabled. We define Xtnd{U) to be the 
set of events by which U can be extended and is formally defined as follows: 

Xtnd{U) = {{C,t) I MEi(C) and (C,t) ^ U} 

Observe that the definition implies that there are no redundancies in the unfol- 
ding. In other words we will not have two different events both having the same 
label and the same postcondition. 

The unfolding algorithm is shown in Figure 2. It maintains two variables, 
namely the current unfolding U (initialized to the final marking and a set 

X of events by which the unfolding can be extended. The algorithm proceeds by 
considering the events in X in turn (this procedure is fair in the sense that each 
event added to X will eventually be considered). At each iteration an event in 
X is picked and moved to U . Furthermore, the possible extensions of the new 
unfolding are computed, using the function Xtnd, and added to X. Notice that 
the algorithm does not necessarily terminate. 

The unfolding algorithm gives a symbolic representation of upward closed 
sets from which Mfin is coverable. More precisely (Theorem 1), the upward 
closure of the markings appearing in U, gives exactly the set of markings from 
which Mfin is coverable. Notice that each event in the unfolding corresponds 
to a step in the backward unrolling of the net. The efficiency we gain through 
applying unfoldings on upward closed sets, as compared to the standard symbolic 
algorithm based on the backward transition relation can be explained in a 
manner similar to the finite state case [McM95,ERV96]; namely the addition of a 
set of concurrent events to the unfolding corresponds to an exponential number 
of applications of the relation. 

In the sequel we let C/* denote the value of the variable U after i iterations of 
the loop. The following lemmas (the proof of which can be found in the appendix) 
relate unfoldings with the backward transition relation 
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Input: net system N = {S, T, F, Minit, Mfin), where = (si, . . . , s„). 
var U: unfolding of A^; A: set of events. 

begin 

U~ (si,0),... ,(s™,0) 

A — XtndiU) 
while (A 7 ^ 0) do 

Pick and delete e = (C, t) from A 

Add (C, t) to U and also add Vs G a new condition (s, e) to U 
A := A U Xtnd{U) 

end 



Fig. 2. Unfolding Algorithm 



k c 

Lemma 2. If Mfin ^ M then there is an I and a configuration E in such 
that mark{E) < M 

We now present the lemma in the other direction which shows that the mar- 
king associated with every configuration in an unfolding is backwards reachable. 

Lemma 3. For each t and configuration E in U^, there is a marking M such 
that M < mark{E) and Mfin M . 

^From Lemma 1, Lemma 2, and Lemma 3 we get the following theorem. 

Theorem 1. Mfin is coverable from a marking M if and only if there is an I 
and a configuration E in such that mark{E) < M. 

Notice that as a special case we can take M in Theorem 1 to be equal to Minit- 

5 Termination 

In this section we show how to compute finite postfixes of unfoldings. We define 
special types of events which we call cut-off points. In Theorem 2 we show that 
cut-off points do not add any markings to the upward closed sets characterized 
by the unfolding. This means that, in the unfolding algorithm (Figure 2) we can 
safely discard all cut-off points, without ever adding them to the unfolding U . 
Furthermore, we use concepts from the theory of well quasi-orderings (Theo- 
rem 2) to show that, if all cut-off points are discarded, then the variable A in 
the unfolding algorithm eventually becomes empty implying termination of the 
algorithm. We start with some definitions and auxiliary lemmas. 

We assume a net system N and an unfolding U of N. For an event, we use 
ef to denote the configuration {e! \ eF*e'}. For configurations Ei and if 2 , we use 
El A E 2 to denote that |ifi| < |if 2 | and mark{Ei) < mark{E 2 ). For an event e, 
we say that e is a cutojf point in U if there is a configuration E in U such that 
E ^ ef. 

We recall from the previous section that i/* denotes the value of the variable 
U in the unfolding algorithm, after i iterations of the loop. In order to prove 
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the cutoff theorem we need the following lemma (the proof of which can be found 
in the appendix). 

Lemma 4. Consider configurations Ei, E2, and E'2 in where E\ -< E2 and 
E2 C E^. There is an £ and a configuration E[ in such that E[ -< E'2 ■ 

Now we are ready to show in the following theorem that cutoff points can be 
discarded safely. 

Theorem 2. For each k and configuration E2 in , there is an I and confi- 
guration El in where mark{Ei) < mark{E2) and E\ does not contain any 
cutoff points. 

Proof. We use induction on \E2\. The base case is trivial. If E2 does not contain 
any cutoff points, then the proof is trivial. Otherwise let 62 be a cutoff point 
in E2. Clearly, 62 fQ E2. Since 62 is a cut-off point, we know that there is a 
configuration E in such that E -< 62 f. By Lemma 4 there is an £ and a 

configuration Ei in such that Ei -< E2, i.e., |ifi| < |if2| and mark{Ei) < 

mark{E2). The claim follows by induction hypothesis. 

To prove termination of the unfolding algorithm, we use the fact that mar- 
kings are well quasi-ordered (consequence of Dickson’s lemma [Dicl3]), i.e., for 
any infinite sequence Mq, Mi, ... of markings, there are i and j with i < j and 
Mi < Mj. 

Theorem 3. The unfolding algorithm terminates if all cut-off points are dis- 
carded. 

Proof. Suppose that the algorithm does not terminate. Since all nodes are finitely 
branching we have an infinite sequence cq, Ci, 62, . . . , of events where ei+iFciFcj, 
for some condition a. Notice that \ej\. \ > |ei4^ |, whenever j > i. By Dickson’s 
lemma, it follows that there are i and j with i < j and mark{eif) < mark{ejf). 
This implies that Cj is a cut-off point, which is a contradiction. 

Remark Theorem 1, Theorem 2, and Theorem 3 give a complete terminating 
procedure for checking coverability in unbounded Petri nets: use the unfolding 
algorithm discarding all cutoff points. The final marking is coverable from 
the initial marking Minu iff a configuration E appears in the unfolding with 
mark{E) < Minit. 

6 Experimental Results 

In this section we report on some of the issues that we had to solve in im- 
plementing the unfolding algorithm. While our implementation borrows ideas 
from [McM95,ERV96], there are several issues that are peculiar to our backward 
reachability. To wit, they are: 

— Implementation of Xtnd: The abstract algorithm (presented in Section 4) 
implies that Xtnd is computed in every iteration. However, in the implemen- 
tation a queue of possible sets of conditions that could be the postset of a 
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A token ring of processes 



A loken ring of processes 



A. Token Ring 



B. Token Ring, version 2 




Fig. 3. Examples of nets considered 
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(potential) event are maintained. As new conditions are generated we check 
whether these new conditions can be added to already existing (partial) 
sets of post-conditions to form larger sets of postconditions. By doing so 
we reduce a seemingly combinatorial problem to a depth-first search on the 
unfolding. 

— Checking termination: As a new event e is generated we calcinate jej, | and 
mark{el). We compare this information against mark{e'D and \mark{e'X)\ 
for all events e' currently in the unfolding. While our definition of a cut-off 
event calls for comparing against all configurations in the unfolding, our im- 
plemention is sound, though not effecient (as otherwise, by Dickson’s Lemma 
there would be sequence of events such that 6*4-^ Ci+ii)- 

Given that the hypothesis of our paper is 

for nets with a great deal of concurrency the storage required to build 
(reverse) occurence nets would be smaller than implementations that 
consider all interleavings 

we compute (a) the maximum number of markings that need to be maintained 
for the traditional backward analysis [AJ98] and (b) the total number of nodes 
generated by the unfolding algorithm. Given that the storage requirements of a 
node is bounded by the storage required for a marking, comparing the number 
of markings from backwards analysis against the total number of nodes in an 
unfolding is appropriate. 

We now report on the results of our experimentation. In Figure 3 we present 
two versions of token-rings and a buffer. A process in a token ring can be active 
when it has a token. We considered several experiments with varying numbers 
of tokens available in the ring and varying numbers of processes. For the buffer 
example we varied the number of tokens available. The result of our experimen- 
tation is reported in Figure 4, where we provide the time taken in seconds and 
the number of nodes/markings using unfoldings and backward analysis. As can 
be seen these results do support our hypothesis that when there is a lot of con- 
currency in a net then unfoldings would require lesser amount of storage than 
traditional backward analysis (which considers all possible interleavings). 

7 Conclusions and Future Work 

We have shown how to extend the technique of unfoldings in order to obtain 
an efficient implementation of a symbolic algorithm for verification of unboun- 
ded Petri nets. In contrast to earlier approaches, the algorithm operates over an 
infinite-state space, using constraints to characterize upward closed sets of mar- 
kings. Since our algorithm relies on a small set of properties of Petri nets which 
are shared by other computation models, we believe that our approach can be 
lifted to a more general setting. In particular we aim to develop a theory of unfol- 
dings for well- structured systems [ACJYK96,FS98] a framework which has been 
applied for verification of several types of infinite-state systems such as timed 
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B. Experimentation on Token Ring, Version 2 
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C. Experimentation with buffer 



Note:* denotes non-termination after a reasonable amount of time. 



Fig. 4. Result of experimentation 
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automata, broadcast protocols, lossy channels systems, relational automata, etc. 
This would allow us to extract common concepts, and provide a guideline for 
developing unfolding algorithms for these classes of systems. Another important 
direction of future research is the design of efficient data structures for imple- 
mentation of the unfolding algorithm, and to carry out experiments to study the 
performance of the algorithm on more advanced examples. We hope to adapt, to 
our context, reasoning techniques for unfoldings, based on integer programming 
and constraints, that have been considered in the literature [MR97,Hel99]. 
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A Proof of Lemmas 

Proof of Lemma 2 By induction on k. 

Base case: Follows from the first step of the algorithm (initializing the value 
of U). 

k 

Induction case: Suppose that Mfin ^ M M. By the induction hypo- 
thesis there is an f and a configuration E' in such that mark{E') < M' . Let 
Cl be a maximal subset of Cut(E') such that #Ci < t* . 

There are two cases. 

(1) if Cl is empty. We define E = E' . We show that mark{E){s) < M{s) for 
each place s. We have two subcases. 

(la) if s ^ t*, then mark{E){s) = mark{E'){s) < mark{E'){s) + *t{s) < 
M'{.s) + M{s) = M{s). 

(lb) if s G t* then we know by maximality of Ci that mark{E'){s) = 0, and 
hence mark{E){s) = mark{E'){s) < M{s). 

(2) if Cl is not empty. Notice that MEt(Ci) holds. Given that our algorithm 

is fair in selecting transitions that are backward Arable, an event e = (Ci, t) will 
be chosen and added at some point £. We define E = E' U {e}. Clearly, if is a 
configuration in . Observe that mark{E) = (mark{E') 0 t*) + *t. This means 
that mark{E) = {mark{E') 0 t*) + *t < {M'{s) 0 t*) +*t = M. □ 
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Proof of Lemma 3 By induction on 

Base case: Follows from the first step of the algorithm (initializing the value 
of U). 

Induction case: Suppose that in step f + 1 we add an event e = (C, t) 
(together with its preset) to U^. Take any configuration E in If e ^ if then 

we know that E is also a configuration in U^, implying the result by induction 
hypothesis. Hence, we can assume that E = E'U {e} where E' is a configuration 
in U^. By the induction hypothesis we know that there is a marking M' such 
that M' < mark{E') and M' . We define M = {M' 0 t*) + *t. Clearly 

M' M and hence Mfin ^ M. Observe that mark{E) = {mark(E') 0 i*) + *t. 
This means that mark{E) = {mark{E') 0 t*) + *t> {M'{s) 0 t*) + *t = M. □ 



Proof of Lemma 4 We show the claim for a configuration E'^ = E2 U {62} 
where 62 ^ E2- The result follows using induction on jif^l — |if2|- Let 62 be of 
the form (C2,t). We know that mark{E'2) = {mark{E2) Qt*) + *t. We define Ci 
to be a maximal subset of Cut{Ei) such that #C\ < t*. 

There are two cases 

1 . If Cl is empty we define E[ = Ei. We have \E[\ = \Ei\ < IC2I < IC2I. We 
show that mark{E[){s) < mark{E'2){s) for each place s. There are two subcases. 

la. if s ^ t*. We have mark{E'2){s) = mark{E2){s) + *t{s) > mark{Ei){s) + 
•f(s) > mark{Ei){s) = mark{E[){s). 

lb. if s G t* . By maximality of Ci we know that mark{Ei){s) = 0 . This means 
that mark{E'2){s) = (mark{E2){s) 0 *t(s)) + *t(s) > *t(s) > mark{Ei){s) = 
mark{E'i){s) . 

2 . If Cl is not empty then by fairness of the algorithm in selecting transitions 

that are backward firable, an event ei = (Ci,t) will be chosen and added at 
some point We define E'^ = E\ {ei}. It is clear that E[ is a configuration 
in . We have \E[\ = |ifi| + 1 < IC2I + 1 < IC2I. We know that mark{E[) = 
{mark{Ei) 0 t*) + *t. this means that mark{E'2) = {mark{E2) 0 *t) + > 

\mark\Ei) Q*t) + *t = mark{E'i){s). □ 
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Abstract. I describe a systematic method for deductive verification of safety pro- 
perties of concurrent programs. The method has much in common with the “veri- 
fication diagrams” of Manna and Pnueli [17], but derives from different intuitions. 
It is based on the idea of strengthening a putative safety property into a disjunction 
of “configurations” that can easily be proved to be inductive. Transitions among 
the configurations have a natural diagrammatic representation that conveys insight 
into the operation of the program. The method lends itself to mechanization and 
is illustrated using a simplified version of an example that had defeated previous 
attempts at deductive verification. 



1 Introduction 

In 1997, Shmuel Katz, Patrick Lincoln and I presented an algorithm for Group Members- 
hip together with a detailed, but informal proof of its correctness [14]. Shortly thereafter, 
our colleague Shankar and, independently, Sadie Creese and Bill Roscoe of Oxford Uni- 
versity, noted that the algorithm is flawed when the number of nonfaulty processors is 
three. Model checking a downscaled instance can be effective in finding bugs (that is 
how Creese and Roscoe found the problem in our algorithm [8]), but true assurance for 
a potentially infinite-state n-process algorithm such as this seems to require (mechani- 
cally checked) deductive methods — either direct proof or justification of an abstraction 
that can be verified by algorithmic means. Over the next year or so, Katz, Lincoln and 
I each made several attempts to formalize and mechanically verify a corrected version 
of the algorithm using the PVS verification system [19]. On each occasion, we were 
defeated by the number and complexity of the auxiliary invariants needed, and by the 
“case explosion” that bedevils deductive approaches to formal verification. 

Eventually, I stumbled upon the method presented in this paper and completed the 
verification in April 1999 [23]. This new method made the verification not merely pos- 
sible, but easy, and it provides a visual representation that conveys considerable insight 
into the operation of the algorithm. Holger Pfeifer of the University of Ulm was sub- 
sequently able to use the method to verify a related but much more complicated group 

* This research was supported by DARPA through US AF Rome Laboratory Contract F30602-96- 
C-0204 and US AF Electronic Systems Center Contract FI 9628-96-C-0006, and by the National 
Science Foundation contract CCR-9509931. 



E.A. Emerson and A.R Sistla (Eds.): CAV 2000, LNCS 1855, pp. 508-520, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Verification Diagrams Revisited: Disjunctive Invariants for Easy Verification 



509 



membership algorithm [2 1 ] used in the Time Triggered Architecture for critical real-time 
control [15] 

1 later discovered that my method has much in common with the “verification dia- 
grams” introduced by Manna and Pnueli [17], and subsequently generalized by Manna 
and several colleagues [5,7,10,16]. However, the intuition that led to my method is rather 
different than that for verification diagrams, as is the way I approach its mechanization. 
I hope that by revisiting these methods from a slightly different perspective, I will help 
others to see their value and to investigate their application to new problems. 

1 describe my method in the next section and present an example of its application 
in the one after that. The final section compares the method with verification diagrams 
and with other techniques and provides conclusions and suggestions for further work. 



2 The Method 

Concurrent systems are modeled as nondeterministic automata over possibly infinite sets 
of states. Given set of states S, initiality predicate I on S, and transition relation T on 
S, a predicate P on S' is inductive for S = (S, I, T) if 

I{s) D P(s)' (1) 

and 

P{s) AT{s,t)D P{t). (2) 

The reachable states are those characterized by the smallest (ordered by Implication) 
inductive predicate P on S. A predicate G is an invariant or safety property if it is larger 
than R (i.e., includes all reachable states). The focus here is on safety (as opposed to 
liveness) properties, so we do not need to be concerned with the acceptance criterion on 
the automaton S. 

The deductive method for verifying safety properties attempts to establish that a 
predicate G is invariant by showing that it is inductive — i.e., we attempt to prove the 
verification conditions (1) and (2) with G substituted for P. The problem, of course, is 
that many safety properties are not inductive, and must be strengthened (i.e., replaced 
by a smaller property) to make them so. Typically, this is done by conjoining additional 
predicates in an incremental fashion, so that G is replaced by 

G; = G A Gi A • • • A G, (3) 

until an inductive G™ is found. This process can be made systematic, but is always 
tedious. In one well-known example, 57 such strengthenings were required to verify a 
communications protocol [12]; each G^+i was discovered by inspecting a failed proof 
for inductiveness of G^, and the process consumed several weeks. 

Some Improvements can be made in this process: static analysis [4] and automa- 
ted calculations of (approximations to) fixpoints of weakest preconditions or strongest 

* Formulas are implicitly universally quantified in their free variables; the horseshoe symbol 3 
denotes logical implication. 
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postconditions [5] can discover many useful invariants that can be used to seed the pro- 
cess as Gi, . . . , Gi. Nonetheless, the transformation of a desired safety property into a 
provably inductive invariant remains the most difficult and costly element in deductive 
verification, and systematic methods are sorely needed. 

The method proposed here is based on strengthening a desired safety property with 
a disjunction of additional predicates, rather than the conjunction appearing in (3). That 
is, we construct 

G“ = GA(GiV---VG„) 
instead of Gj!j. Obviously, this can be rewritten as follows 

G™ = (GAGi) V---V(GAG^). 

Rather than form each disjunct as a conjunction (G A Gj), it is generally preferable to 
use 

G™ = G'l V • • • V G'^ (4) 

and then prove Gi D G for each G'. The subexpressions G' are referred to as configu- 
rations, and the indices i as configuration indices. 

Observe that in the construction of GJfi, each Gi must be an invariant (the very 
property we are trying to establish), and that the inadequacy of G'j only becomes apparent 
through failure of the attempted proof of its inductiveness — and proof of the putative 
inductiveness of G^^^ must then start over.^ In contrast, the configurations used in 
construction of G™ need not themselves be invariants, and can be discovered in a rather 
systematic manner. To see this, first suppose that G™ is inductive, and consider the 
proof obligations needed to establish this fact. Instantiating (2) with Gjfi of (4) and case- 
splitting across the configurations, we will need to prove a verification condition of the 
following form for each configuration index v. 

G'fis)AT{s,t)DG\{t)V---VGUt). 

We can further case-split on the right of the implication by introducing predicates 
Gij{s) called transition conditions such that, for each configuration index i 

^s&S-.\J (5) 

3 

(here j ranges over the indices of the transition conditions for configuration G'fi and 

G'(s)AT(s,f)AGi,,(s) DG'(f) (6) 

for each transition condition G^ y of each configuration G'. Note that some of the G^ y 
may be identically /aNe (so that the proof obligation (6) is vacuously true for this case) 
and that it is not necessary that the Ci^ for different j be disjoint. 

This construction can be represented in a diagrammatic form called a configuration 
diagram such as that shown several pages ahead in Figure 1 . Here, each vertex represents 

^ PV S attempts to lessen the amount of rework that must be performed in this situation by allowing 
conjectures to be modified during the course of a proof; such proofs are marked provisional 
until a final “clean” verification is completed. 
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a configuration and is labeled with the name of the corresponding formula C?' and each 
arc represents a non-false transition condition and is labeled with a phrase that suggests 
the corresponding predicate. To verify the diagram, we need to show that the initiality 
predicate implies some disjunction of configurations 

/(s)dG;(s)V---VG'^(s) (7) 

(typically there is just a single starting configuration), that each configuration implies 
the desired safety property 



G;(s)V---VG'„(s) dG(s), (8) 

that the disjunction of the transition conditions leaving each configuration is true (i.e., 
(5)), and that the transition relation indeed relates the configurations in the manner shown 
in the diagram (i.e., the verification conditions (6)). Notice that this is just a new way 
of organizing a traditional deductive invariance proof (i.e., the proof obligations (5)-(8) 
imply (1) and (2) with G substituted for P). And although a configuration diagram has 
some of the character of an abstraction, its verification involves only the original model, 
and no new verification principles are involved. 

The previous discussion assumed we already had a configuration diagram; in practice, 
the diagram is constructed incrementally in the course of the proof. To construct a 
configuration diagram, we start by inventing a starting configuration and checking that it 
is implied by the initiality predicate and implies the safety property (i.e. , proof obligations 
(7) and (8)). Then, by contemplation of the algorithm (the guard predicates and other case- 
splits in the specification are good guides here), we invent some transition conditions for 
the starting configuration and check that their disjunction is true (i.e., proof obligation 
(5)). For each transition condition, we symbolically simulate a step of the algorithm 
from the starting configuration, under that condition. The result of symbolic simulation 
becomes a new configuration (and implicitly discharges proof obligation (6) for that 
case) — unless we recognize it as a variant of an existing configuration, in which case 
we must explicitly discharge proof obligation (6) by proving that the result of symbolic 
simulation implies the existing configuration concerned (sometimes it may be necessary 
to generalize an existing configuration, in which case we will need to revisit previously- 
proved proof obligations involving this configuration to ensure that they are preserved 
by the generalization). We also check that each new or generalized configuration implies 
the safety property (i.e., proof obligation (8)). This process is repeated for each transition 
condition and each new configuration until the diagram is closed. The creative steps are 
the selection of transition conditions, and recognition of new configurations as variants of 
existing ones. Neither of these is hard, given an informal understanding of the algorithm 
being verified, and fhe resulting diagram nol only verifies fhe desired safety property 
(once all its proof obligations are discharged), but it also serves to explain the operation 
of the algorithm in a very effective way. Bugs in the algorithm, or unfortunate choices of 
configurations or of transition conditions, will be manifested as difficulty in closing the 
diagram (typically, the result of a symbolic simulation step will not imply the expected 
configuration). As with most deductive methods, it can be tricky to distinguish between 
these causes of failure. 
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3 An Example: Group Membership 



A simplified version of the group membership algorithm mentioned earlier [14] will 
serve as an example. There are n processors numbered 0, 1, . . . , n — 1 connected to a 
broadcast bus; a distributed clock synchronization algorithm (not discussed here) pro- 
vides a global clock that ticks off “slots” 0,1,2,... In slot i it is the turn of processor 
i mod n to broadcast. The broadcast contains a message, not considered here, and the 
ack bit of the broadcasting processor, which is described below. Processors may be 
faulty or nonfaulty; those that are faulty may be send-faulty, receive-faulty, or both. A 
processor that is send-faulty will fail to send its broadcast message in its first slot after 
it becomes faulty; thereafter it may or may not broadcast in its slots. A processor that is 
receive-faulty will fail to receive the first broadcast from a nonfaulty processor after it 
becomes faulty; thereafter it may or may not receive broadcasts. Notice that faults affect 
only communications: a faulty processor still executes the algorithm correctly; additio- 
nal elements in the full protocol suite ensure that other kinds of faults are manifested as 
“fail silence,” which appears to the algorithm described here as a combined send- and 
receive-fault in the processor concerned. 

Each processor maintains a membership set which contains all and only the pro- 
cessors that it believes to be nonfaulty. Processors broadcast in their slots only if they 
are in their own membership sets. The goal of the algorithm is to maintain accurate 
membership sets: all nonfaulty processors should have the same membership sets (this 
is the agreement property) and those membership sets should contain all the nonfaulty 
processors and at most one faulty one (this is the validity property; it is necessary to allow 
one faulty processor in the membership because it takes time to diagnose a fault). These 
safety properties must be ensured subject to the fault arrival hypothesis that faults do not 
arrive closer than n slots apart. Initially all processors are nonfaulty, their membership 
sets contain all processors, and their ack bits are true. 

The algorithm is a synchronous one: in each slot one processor broadcasts and all 
the other processors expect to receive its message, provided the broadcaster is in their 
membership sets. Receivers set their ack bits to true in each slot iff they receive an 
expected message. In addition, they remove the broadcaster from their membership sets 
if they fail to receive an expected message (on the interim assumption that the broadcaster 
must have been send-faulty). A receiver that subsequently receives a message carrying 
a.ck false when its own ack is also/afae knows that it made the correct decision in this 
case (since the current broadcaster also missed the previous expected message), but one 
that receives ack true realizes that it must have been receive-faulty (since the current 
broadcaster did receive the message) and removes itself from its own membership; a 
receiver that fails to receive an expected message when its ack bit is false also removes 
itself from its own membership (because it has missed two expected messages in a 
row, which is consistent with the fault arrival hypothesis only if that processor is itself 
receive-faulty); a receiver that receives a message with ack false when its own ack 
bit is true removes the broadcaster from its membership (since the broadcaster must 
have been receive-faulty on the previous broadcast). Processors that remove themselves 
from their own membership remain silent when it is their turn to broadcast — thereby 
communicating their self-diagnosed receive-faultiness to the other processors. 
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Formally, we let mem(p) and ack(p) denote the membership set and ack bit of 
processor p. Note that processor p has access to its own mem and ack, and can also read 
the value of ack(6), where b = i mod n and i is the current slot number, because this is 
sent in the message broadcast in that slot. 

Initiality predicate: mem(p) = {0, 1, . . . , n — 1}, ack(p) = true? 

The algorithm is specified by two lists of guarded commands: one for the broadcaster 
and one for the receivers. Primes denote the updated values of the state variables. The 
current slot is i and the current broadcaster is b, where b = i mod n. 

Broadcaster: Processor b executes the appropriate guarded command from the follo- 
wing list. 

(а) b G mem(6) -P- mem(6)' = mem(&), ack(5)' = true 

otherwise — >■ no change. 

Receiver: Each processor p ^ b executes the appropriate guarded command from the 
following list: 

The guards (b)-(g) apply when b € mem(p) Ap G mem(p) 

(б) ack(p) A no msg rcvd -G mem(p)' = mem(p) — {6}, ack(p)' = false 

(c) ack(p) A ack(6) ^ mem(p)' = mem(p), ack(p)' = 

(d) ack(p) A -iack(&) — >■ mem(p)' = mem(p) — {6}, ack(p)' = true 

(e) -iack(p) A no msg rcvd mem(p)' = mem(p) — {p} 

(/) ->ack(p) A -'ack(6) — ^ mem(p)' = mem(p), ack(p)' = tree 

(g) ->ack(p) A ack(&) — >■ mem(p)' = mem(p) — {p} 

otherwise -G- no change. 

The environment can perform only a single action: it can cause a new fault to arrive — 
provided no other fault has arrived “recently.” Characterization of “recently” is consi- 
dered below. We let the _mem denote the current set of nonfaulty processors, so that the 
following specifies arrival of a fault in a previously nonfaulty processor x. 

Fault Arrival: 3x G thejmem : thennem' = thejmem — {x} 

The desired safety properties are specified as follows. 

Agreement: p G the_mem A q G the_mem D mem(p) = mem{q) 

Validity: p G thejnem D mem(p) = the_mem V 3x : mem(p) = the_mem U {a;} 

The first says that all nonfaulty processors p and q have the same membership sets; the 
second says that the membership set of a nonfaulty processor p contains all nonfaulty 
processors, and possibly one faulty one. 

The starting configuration is the following: all nonfaulty processors have their ack 
bits true and their membership sets contain just the nonfaulty processors. 

^ I use the redundant = true because some find that form easier to read. 

This case could be absorbed into the “otherwise” clause with no change to the algorithm; 
however, the stmcture of the algorithm seems clearer written this way. 
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Stable: p G thejnem D mem(p) = thejnem A ack(p) = true 

It is natural to consider two transition conditions from this configuration: one where 
a new fault arrives, and one where it does not. In the latter case, the broadcaster will leave 
its state unchanged (no matter whether its executes command (a) or its “otherwise” case), 
and the receivers will execute either their command (c) or their “otherwise” case, and 
leave their states unchanged. The overall effect is to remain in the stable configuration. 
In the case that a new fault arrives, the same transitions as above will be executed but 
some previously nonfaulty processor x will become faulty, leading to the following 
conhguration. 

Latent(a;): x ^ thejnem 

Ap G thejnem U {x} D mem(p) = thejnem U {x} A ack(p) = true 

There are two transition conditions from latent(x): one where x is the broadcaster 
in the next slot, and one where it is a receiver. 

In the former case, x will execute its command (a) while all nonfaulty receivers will 
note the absence of an expected message and execute their commands (b), leading to the 
following conhguration. 

Excludedi(x): x ^ thejnem A mem(x) = thejnem U {x} A ack(x) = true 
Ap G thejnem D mem(p) = thejnem A ack(p) = false 

In the latter case, a nonfaulty broadcaster will transmit^ and its message will be recei- 
ved by all nonfaulty receivers, but missed by x, leading to the following conhguration. 

Missed_rcv(x): x ^ thejnem A mem(x) = thejnem U {x} — {6} A ack(x) = false 
Ap G thejnem D mem(p) = thejnem U {x} A ack(p) = true 

There are four transition conditions from missed-rcv(x): one where the next broa- 
dcaster is X and it fails to broadcast; one where x does broadcast; one where the next 
broadcaster is already faulty; and an “otherwise” case. The hrst of these is similar to the 
transition from latent(x) to excludedi (x) and leads to the following conhguration. 

Excluded2(x): x ^ thejnem A mem(x) = thejnem U {x} — {5} A ack(x) = true 
Ap G thejnem D mem(p) = thejnem A ack(p) = false 

We recognize that excludedi (x) and excluded 2 (x) should each be generalized to yield 
the following common conhguration. 

Excluded(x): p G thejnem D mem(p) = thejnem A ack(p) = false 

In the case where x does broadcast, it will do so with Sick false, causing nonfaulty 
processors to execute their commands (d) and leading directly to the stable conhguration. 

^ Treatment of the case that the next broadcaster is an already-faulty one depends on how fault 
“arrivals” are axiomatized: in one treatment, a fault is not considered to arrive until it can 
be manifested (thereby excluding this case); the other treatment will produce a self-loop on 
latent(x) in this case. These details are a standard complication in verihcation of fault-tolerant 
algorithms and are not signihcant here. 
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Fig. 1. Configuration Diagram for the Group Membership Example 



The case where the next broadcaster is already faulty causes all nonfaulty processors 
and processor x to leave their states unchanged (since that broadcaster will not be in 
their membership sets), thereby producing a loop on missed-rcv{x). The remaining case 
(a broadcast by a nonfaulty processor, executing its command (a)) will cause nonfaulty 
receivers to execute their commands (c), while x will either miss the broadcast (executing 
its command (e)), or will discover the true ack bit on the received message and recognize 
its previous error (executing its command (g)); in either case, x will exclude itself from 
its own membership, leading to the following configuration. 

Self_diag(a;): X ^ the jnem !\x ^ mem(a;) 

A p G the jnem D mem(p) = thejnem U {x} A ack(p) = true 

The transition conditions from this new configuration are those where x is the broad- 
caster, and those where it is not. In the former case, x will fail to broadcast (since it is not 
in its own membership), causing nonfaulty processors to execute their commands (b) and 
leading to the configuration excluded(x). The other case will cause them to execute their 
commands (c), or their “otherwise” cases, producing a self-loop on the configuration 
self-diag{x). 

The only transitions that remain to be considered are those from configuration exclu- 
dedix). The transition conditions here are the case where the next broadcaster is already 
faulty, and that where it is not. The former produces a self-loop on this configuration, 
while the latter causes all nonfaulty receivers to execute their commands (f) while the 
broadcaster executes its command (a), leading to a transition to configuration stable. 

It is easy to see that the initiality predicate implies the stable configuration and that 
all configurations imply the desired safety properties, and so we have now completed 
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construction and verification of the diagram shown in Figure 1 . The labels in the vertices 
of this diagram indicate the corresponding configuration, while the labels on the arcs are 
intended to suggest the corresponding transition condition. One detail has been glossed 
over in this construction, however: what about the cases where a new fault arrives while 
we are still dealing with a previous fault? In fact, this possibility is excluded in the full 
axiomatization of the fault arrival hypothesis, which states that faults may only arrive 
when the configuration is stable (we then need to discharge trivial proof obligations that 
all the other configurations are disjoint from this one). We connect this axiomatization of 
the fault arrival hypothesis with the “real” one that faults must arrive more than n slots 
apart by proving a bounded liveness property that establishes that the system always 
returns to a stable configuration within n slots of leaving it. This proof requires that 
configurations are embellished with additional parameters and clauses that remember 
the slots on which certain events occurred and count the numbers of self-loop iterations. 
The details are glossed because they are peripheral to the main concern of this paper; 
they are present in the mechanized verification of this example using PVS, which is 
available at http://www.csl.sri.eom/~rushby/cavOO.html and in a paper that 
describes verification of the full membership protocol [23]. (The full algorithm differs 
from the simplified version given here in that all faulty processors eventually diagnose 
their faults and exclude themselves from their own membership; its proof is about four 
times as long as that presented here).® 

4 Discussion, Comparison, and Conclusion 

The flawed verification of the full membership algorithm in [14] strengthens the desired 
safety properties, agreement and validity, with six additional invariants in an attempt 
to obtain a conjunction that is inductive. Five of these additional invariants are quite 
complicated, such as the following. 

“If a receive fault occurred to processor p less than n steps ago, then either p is 
not the broadcaster or ack(p) is false while all nonfaulty q have ack(g) = true, 
or p is not in its own membership set.” 

The informal proof of inductiveness of the conjoined invariants is long and arduous, and 
it must be flawed because the algorithm has a bug in the n = 3 case. This proof resisted 
several determined attempts to correct and formalize it in PVS. In contrast, the approach 
presented here led to a straightforward mechanized verification of a corrected version 
of the algorithm.^ Furthermore, as I hope the example has demonstrated, this approach 
is naturally incremental, develops understanding of the target algorithm, and yields a 
diagram that helps convey that understanding to others. In fact, the diagram (or at least 
its outline) can usually be constructed quite easily using informal reasoning, and then 
serves as a guide for the mechanized proof. 

* The algorithm presented here is fairly obvious; there is a similarly obvious solution to the full 
problem (with self-diagnosis) that uses two ack bits per message; this clarifies the contribution 
of [14], which is to achieve full self-diagnosis with only one ack bit per message. 

’’ The verification was completed on a Toshiba Libretto palmtop computer of decidedly modest 
performance (75 MHz Pentium with 32 MB of memory). 
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This approach is strongly related to the verification diagrams and their associated 
methods introduced by Manna and Pnueli [17]. These were subsequently extended and 
generalized by Manna with Bjprner, Browne, de Alfaro, Sipma, and Uribe [5,7,10,16]. 
However, these later methods mostly concern fairness and liveness properties, or ex- 
tensions for deductive model checking and hybrid systems, and so I prefer to compare 
my approach with the original verification diagrams. These comprise a set of vertices 
labeled with formulas and a set of arcs labeled with transitions that correspond to the 
configurations and transition conditions, respectively, of my method. However, there are 
small differences between the corresponding notions. First, it appears that verification 
diagrams have a finite number of vertices, whereas configurations can be finite or infi- 
nite in number. The example presented in the previous section is a parameterized system 
with an unbounded parameter n, and most of the configurations are parameterized by 
an individual x selected from the set {0, 1 , . . . , n}, yielding an arbitrarily large number 
of configurations; Skolemization (selection of an arbitrary representative) reduces the 
number of proof obligations to a finite number. Second, the arcs in verification diagrams 
are associated with transitions, whereas those in my approach are associated with pre- 
dicates. It is quite possible that this difference is a natural manifestation of the different 
examples we have undertaken: those performed with verification diagrams have been 
asynchronous systems (where each system transition corresponds to a transition by some 
component), whereas I have been concerned with synchronous systems (where each sy- 
stem transition corresponds to simultaneous transitions by all components). Thus, in 
asynchronous systems the transitions suggest a natural analysis by cases, whereas in 
synchronous systems (especially those, as here, without explicit control) the case ana- 
lysis must be consciously imposed by selection of suitable transition conditions. 

Mechanized support for verification diagrams is provided in STeP [18]: the user 
proposes a diagram and the system generates the necessary verification conditions. PVS 
provides no special support for my approach, but its standard mechanisms are adequate 
because the approach ultimately yields a conventional inductive invariance proof that is 
checked by PVS in the usual way. As illustrated in the example, the configuration dia- 
gram can be constructed incrementally: starting from an existing configuration, the user 
proposes a transition condition and then symbolically simulates a step of the algorithm 
(mechanized in PVS by rewriting and simplification); the result either suggests a new 
configuration or corresponds to (possibly a generalization of) an existing one. Enhan- 
cements to PVS that would better support this activity are primarily improvements in 
symbolic simulation (e.g., faster rewriting and better simplification). 

The key to any inductive invariance proof is to find a partitioning of the state space 
and a way to organize the case analysis so that the overall proof effort is manageable. 
The method of disjunctive invariants is a systematic way to do this that seems effec- 
tive for some problem domains. Other recent methods provide comparably systematic 
constructions for verifications based on simulation arguments: the aggregation method 
of Park and Dill [20] and the completion functions of Hosabettu, Gopalakrishnan and 
Srivas [13] greatly simplify construction of the abstraction functions used in verifying 
cache protocols and processor pipelines, respectively. 

Other methods with some similarity to the approach proposed here are those based 
on abstractions: typically the idea is to construct an abstraction of the original system 
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that preserves the properties of interest and that has some special form (e.g., finite state) 
that allows very efficient analysis (e.g., model checking). Methods based on predicate 
abstraction [24] seem very promising [1,3,9,25]. A configuration diagram can be con- 
sidered an abstraction of the original state machine and it is plausible that it could be 
generated automatically by predicate abstraction on the predicates that characterize its 
configurations and transition conditions. However, it is difficult to see how the user 
could obtain sufficient insight to propose these predicates without constructing most 
of the configuration diagram beforehand, and it is also questionable whether fully au- 
tomated theorem proving can construct sufficiently precise abstractions of these fairly 
difficult examples using current technology. 

Such an abstracted system would still have n processes and further reduction would 
be needed to obtain a finite-state system that could be model checked. Creese and Roscoe 
[8] do exactly this for the algorithm of [14] using a technique based on a suitable notion 
of data independence [22]. They use a clever generalization to make the processes of 
algorithm independent of how they are numbered and are thereby able to establish the 
abstracted n-process case by an induction whose cases can be discharged by model 
checking with FDR. This is an attractive approach with much promise, but formal and 
mechanized justification for the abstraction of the original algorithm still seems quite 
difficult (Creese and Roscoe provide a rigorous but informal argument).^ 

In summary, the approach presented here is one of a growing number of methods 
for verifying properties of certain classes of algorithms in a systematic manner. Cir- 
cumstances in which this approach seems most effective are those where the algorithm 
concerned naturally progresses through different phases: these give rise to distinct dis- 
juncts G' in a disjunctive invariant G™ but are correspondingly hard to unify within a 
conjunctive invariant G™. Besides those examples already mentioned, the approach has 
been used successfully by Holger Pfeifer to verify another group membership algorithm 
[21]: the very tricky and industrially significant algorithm used in the Time Triggered 
Architecture for safety-critical distributed real-time control [15]. 

The most immediate targets for further research are empirical and, perhaps, theo- 
retical investigations into the general utility of these approaches. The targets of my 
approach have all been synchronous group membership algorithms, while the verifica- 
tion diagrams of Manna et al. seem not to have been applied to any hard examples (the 
verification in STeP of an interesting Leader Election algorithm [6] did not use diagram- 
matic methods). If practical experience with a variety of different problem types shows 
the approach to have sufficient utility, then it will be worth investigating provision of 
direct mechanical support. 
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Abstract. In this paper, we discuss the verification of a microproces- 
sor involving a reorder buffer, a store buffer, speculative execution and 
exceptions at the microarchitectural level. We extend the earlier propo- 
sed Completion Functions Approach [HSG98] in a uniform manner to 
handle the verihcation of such microarchitectures. The key extension to 
our previous work was in systematically extending the abstraction map 
to accommodate the possibility of all the pending instructions being 
squashed. An interesting detail that arises in doing so is how the com- 
mutativity obligation for the program counter is proved despite the pro- 
gram counter being updated by both the instruction fetch stage (when a 
speculative branch may be entertained) and the retirement stage (when 
the speculation may be discovered to be incorrect). Another interesting 
detail pertains to how store buffers are handled. We highlight a new 
type of invariant in this work — one which keeps correspondence between 
store buffer pointers and reorder buffer pointers. All these results, ta- 
ken together with the features handled using the completion functions 
approach in our earlier published work [HSG98,HSG99,HGS99], demon- 
strates that the approach is uniformly applicable to a wide variety of 
pipelined designs. 



1 Introduction 

Formal Verification of pipelined processor implementations against instruction 
set architecture (ISA) specifications is a problem of growing importance. A sig- 
nificant number of processors being sold today employ advanced features such as 
out-of-order execution, store buffers, exceptions that cause pending uncommit- 
ted instructions to be squashed, and speculative execution. Recently a number 
of different approaches [HSG99,McM98,PA98] have been used to verify simple 
out-of-order designs. To the best of our knowledge, no single formal verification 
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technique has been shown to be capable of verifying processor designs that sup- 
port all of these features and also apply to other processors such as those that 
perform out-of-order retirement. In this paper, we report our successful applica- 
tion of the Completion Functions Approach to verify an out-of-order execution 
design with a reorder buffer, a store buffer, exceptions and speculation, using 
the PVS [ORSvH95] theorem-prover, taking only a modest amount of time for 
the overall proof. This result, taken together with the earlier published applicati- 
ons of the completion functions approach [HSG98,HSG99,HGS99], demonstrates 
that the approach is uniformly applicable to a wide variety of pipelined designs. 

One of the challenges posed in verifying a combination of the above mentioned 
advanced features is that the resulting complex interaction between data and 
control usually overwhelms most automatic methods, whether based on model 
checking or decision procedures. One of the main contributions of this work is 
that we develop a way of cleanly decomposing the squashing of instructions from 
normal execution. These decomposition ideas are applicable to theorem proving 
or model checking or combined methods. 

Our basic approach is one of showing that any program run on the specifica- 
tion and the implementation machines returns identical results. This verification 
is, in turn, achieved by identifying an abstraction map ABS that relates implemen- 
tation states to corresponding specification states. The key to make the above 
technique work efficiently in practice is a proper definition of ABS. As we showed, 
in our earlier work [HSG98], one should ideally choose an approach to construc- 
ting ABS that is not only simple and natural to carry out, but also derives other 
advantages, the main ones being modular verification that helps localize errors, 
and verification reuse that allows lemmas proved about certain pipeline stages 
to be used as rewrite rules in proving other stages. In [HSG98], we introduced 
such a technique to define ABS called the Completion Functions Approach. In 
subsequent work [HSG99,HGS99,Hos99], we demonstrated that the completion 
functions approach can be applied uniformly to a wide variety of examples that 
include various advanced pipelining features. An open question in our previous 
work was whether combining out-of-order execution with exceptions and specu- 
lation would make the task of defining completion functions cumbersome and 
the approach impractical. 

In this paper, we demonstrate that the completion functions approach is ro- 
bust enough to be used effectively for such processors, that is, (i) the specifica- 
tion of completion functions are still natural, amounting to expressing knowledge 
that the designer already has; (ii) verification proceeds incrementally, facilitating 
debugging and error localization; (iii) mistakes made in specifying completion 
functions never lead to false positives; and (iv) verification conditions and most 
of the supporting lemmas needed to finish a proof can be generated systemati- 
cally, if not automatically. They can also be discharged with a high degree of 
automation using strategies based on decision procedures and rewriting. These 
observations are supported by our final result: a processor design supporting 
superscalar execution, store buffers, exceptions, speculative branch prediction, 
and user and supervisor modes could be fully verified in 265 person hours. This, 




Verifying Advanced Microarchitectures 



523 



we believe, is a modest investment in return for the significant benefits of design 
verification. 

Some of the highlights of the work we report are as follows. Given that our 
correctness criterion is one of showing a commutativity obligation between im- 
plementation states and specification states, the abstraction map used in the 
process must somehow accommodate the possibility of instructions being squas- 
hed. We show how this is accomplished. This leads us to a verification condition 
with two parts, one pertaining to the processor states being related before and 
after an implementation transition, and the other relating to the squashing pre- 
dicate itself. Next, we show how the commutativity obligation for the program 
counter is obtained despite the program counter being updated by both the in- 
struction fetch stage (when a speculative branch may be entertained) and the 
retirement stage (when the speculation may be discovered to be incorrect). We 
also show how the store buffer is handled in our proof. We detail a new type of 
invariant in this work, which was not needed in our earlier works. This invariant 
keeps correspondence between store buffer pointers and reorder buffer pointers. 



2 Processor Model 

At the specification level, the state of the processor is represented by a register 
file, a special register file accessed only by privileged/special instructions, a data 
memory, a mode flag, a program counter and an instruction memory. The pro- 
cessor operating mode (one of user/supervisory) is maintained in the mode flag. 
User mode instructions are an alu instruction for performing arithmetic and 
logical operations, load and store instructions for accessing the data memory, 
and a beq instruction for performing conditional branches. Three additional pri- 
vileged instructions are allowed in the supervisory mode: rfeh instruction for 
returning from an exception handler, and mf sr and mtsr instructions for moving 
data from and to the special register file. Three types of exceptions are possible: 
arithmetic exception raised by an alu instruction, data access exception raised 
by load and store instructions when the memory address is outside legal bo- 
unds (two special registers maintain the legal bounds, and this is checked only 
in user mode), and an illegal instruction exception. When an exception is raised, 
the processor saves the address of the faulting instruction in a special register 
and jumps to an exception handler assuming supervisory mode in the process. 
After processing a raised exception, the processor returns to user mode via the 
rfeh instruction. 

An implementation model of this processor is shown in Figure 1. A reorder 
buffer, implemented as a circular FIFO queue with its tail pointing to the earliest 
issued instruction and head pointing to the first free location in the buffer, is 
used to maintain program order, to permit instructions to be committed in that 
order. Register translation tables (regular and special) provide the identity of 
the latest pending instruction writing a particular register. “Alu/Branch/Special 
Instr. Unit” (referred to as ABS Unit) executes alu, beq and all the special 
instructions. The reservation stations hold the instructions sent to this unit 
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Fig. 1. The block diagram model of our implementation 



until they are ready to be dispatched onto an appropriate execution unit. These 
instructions are executed out of program order by the multiple execution units 
present in the ABS Unit. Instructions load and store are issued to the “Load 
Store Unit” (referred to as LS Unit) where the reservation stations form a circular 
FIFO queue storing the instructions in their program order. (Again, tail points 
to the earliest instruction and head points to the first free reservation station.) 
These instructions are executed in their program order by the single execution 
unit present in the LS Unit. For a store instruction, the memory address and 
the value to be stored are recorded in an entry in the store buffer, and the value 
is later written into the data memory. The store buffer is again implemented 
as a circular FIFO queue, with head and tail pointers, keeping the instructions 
to be written to the data memory in their program order. When two store 
buffer entries refer to the same memory address, the latest one has a flag set. 
A load instruction first attempts an associative search in the store buffer using 
the memory address. If multiple store buffer entries have the same address, the 
search returns the value of the latest entry. If the search does not find a matching 
entry, the data for that address is returned from the data memory. A scheduler 
controls the movement of the instructions through the execution pipeline (such 
as being dispatched, executed etc.) and its behavior is modeled by axioms (to 
allow us to concentrate on the processor “core”). Instructions are fetched from 
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the instruction memory using a program counter; and the implementation also 
takes a no_op input, which suppresses an instruction fetch when asserted. 

An instruction is issued by allocating an entry for it at the head of the reorder 
buffer and (depending on the instruction type) either a free reservation station 
(sch_new_slot) in the ABS Unit or a free reservation station at the head of the 
queue of reservation stations in the LS Unit. If the instruction being issued is a 
branch instruction, then the program counter is modified according to a predicted 
branch target address (sch_pred_target, an unconstrained arbitrary value), and 
in the next cycle the new instruction is fetched from this address. No instruction 
is issued if there are no free reservation stations/reorder buffer entries or if no_op 
is asserted or if the processor is being restarted (for reasons detailed later). 
The RTT entry corresponding to the destination of the instruction is updated 
to reflect the fact that the instruction being issued is the latest one to write 
that register. If the source operands are not being written by previously issued 
pending instructions (checked using the RTT) then their values are obtained 
from the register file, otherwise the reorder buffer indices of the instructions 
providing the source operands are maintained (in the reservation station). Issued 
instructions wait for their source operands to become ready, monitoring all the 
execution units if they produce the values they are waiting for. An instruction 
can be dispatched when its source operands are ready and a free execution unit 
is available In case of the LS Unit, only the instruction at the tail of the queue 
of reservation stations is dispatched. As soon as an instruction is dispatched, 
its reservation station is freed. The dispatched instructions are executed and 
the results are written hack to their respective reorder buffer entries as well as 
forwarded to those instructions waiting for this result. If an exception is raised 
by any of the executing instructions, then a flag is set in the reorder buffer 
entry to indicate that fact. In case of a store instruction, the memory address 
and the value to be stored are written into a store buffer entry instead of the 
reorder buffer entry when the store instruction does not raise an exception 
(other information such as the “ready” status etc. are all written into the reorder 
buffer entry). The control signals from the scheduler determine the timings of 
this movement of the instructions in the execution pipeline. 

The instruction at the tail of the reorder buffer is committed to the architec- 
turally visible components, when it is done executing (at a time determined by 
sch_retire_rb?). If it is a store instruction, then the corresponding store buffer 
entry is marked committed and later written into the data memory (at a time 
determined by sch_sb_retire_mem?). Also, if the RTT entry for the destination 
of the instruction being retired is pointing to the tail of the reorder buffer, then 
that RTT entry is updated to reflect the fact that the value of that register is 
in the appropriate register file. If the instruction at the tail of the reorder buffer 
has raised an exception or if it is a mis-predicted branch or if it is a rfeh in- 



^ Multiple instructions can be simultaneously dispatched, executed and written back 
in one clock cycle. However, for simplicity, we do not allow multiple instruction issue 
or retirement in a single clock cycle. 
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struction, then the rest of the instructions in the reorder buffer are squashed and 
the processor is restarted by resetting all of its internal (non-observable) state. 

3 The Completion Functions Approach 

The key idea in proving the correctness of pipelined microprocessors is to discover 
a formal correspondence between the execution of the implementation and the 
specification machines. The completion functions approach suggests a way of 
constructing this abstraction in a manner that leads to an elegant decomposition 
of the proof. In the first subsection, we briefly discuss the correctness criterion 
we use. In the second subsection, we describe the different steps in constructing 
a suitable abstraction function for the example under consideration. In the third 
subsection, we discuss how to decompose the proof into verification conditions, 
the proof strategies used in discharging these obligations, and the invariants 
needed in our approach. The PVS specifications and the proofs can be found at 
[Hos99]. 

3.1 Correctness Criterion 

We assume that the pipelined implementation and the ISA-level specification 
are provided in the form of transition functions, denoted by I_step and A_step 
respectively. The specification machine state is made up of certain components 
chosen from the implementation machine called the observables. The function 
projection extracts these observables given an implementation machine state. 
The state where the pipelined machine has no partially executed instructions is 
called a flushed state. 

We regard a pipelined processor implementation to be correct if the behavior 
of the processor starting in a flushed state, executing a program, and termina- 
ting in a flushed state is matched by the ISA level specification machine whose 
starting and terminating states are in direct correspondence with those of the 
implementation processor through projection. This criterion is shown in Fi- 
gure 2(a) where n is the number of implementation machine transitions in a 
run of the pipelined machine and m corresponds to the number of instructions 
executed in the specification machine by this run. An additional correctness cri- 
terion is to show that the implementation machine is able to execute programs 
of all lengths, that is, it does not get into a state where it refuses to accept any 
more new instructions. In this paper, we concentrate on proving the correctness 
criterion expressed in Figure 2(a) only. 

The criterion shown in Figure 2(a) spanning an entire sequential execution 
can be established with the help of induction once a more basic commutativity 
ohligation shown in Figure 2(b) is established on a single implementation ma- 
chine transition. This criterion states that if the implementation machine starts 
in an arbitrary state q and the specification machine starts in a corresponding 
specification state (given by an abstraction function ABS), then after executing 
a transition their new states correspond. A_step_new stands for zero or more 
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applications of A_step. The number of instructions executed by the specifica- 
tion machine corresponding to an implementation transition is given by a user 
defined synchronization function. Our method further verifies that the ABS fun- 
ction chosen corresponds to projection on flushed states, that is, ABS(fs) = 
projectionCf s) holds on flushed states, thus helping debug ABS. The user may 
also need to discover invariants to restrict the set of implementation states con- 
sidered in the proof of the commutativity obligation and prove that it is closed 
under I_step. 



projection 




m A_step 




Fig. 2. Pipelined microprocessor correctness criterion 



The crux of the problem here is to define a suitable abstraction function 
relating an implementation state to a specification state. The completion func- 
tions approach suggests a way of doing this in a manner that leads to an elegant 
decomposition of the proof. We now detail how this is achieved for our example 
processor. 



3.2 Compositional Construction of the Abstraction Function 

The first step in defining the abstraction function is to identify all the unfinished 
instructions in the processor and their program order. In this implementation, 
the processor (when working correctly) stores all the currently executing instruc- 
tions in their program order in the reorder buffer. We identify an instruction in 
the processor with its reorder buffer index, that is, we refer to instruction at 
reorder buffer index rbi as just instruction rbi^. In addition to these, the store 
buffer has certain committed store instructions yet to be written into the data 
memory, recorded in their program order. These store instructions are not as- 
sociated with any reorder buffer entry and occur earlier in the program order 
than all the instructions in the reorder buffer. 

^ Brief explanation of some of the notation used throughout rest of the paper: q refers 
to an arbitrary implementation state, s the scheduler output, i the processor input, 
I_step(q, s , i) the next state after an implementation transition. We sometimes refer 
to predicates and functions defined without explicitly mentioning their arguments, 
when this causes no confusion. 
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Having determined the program order of the unfinished instructions, the se- 
cond step is to define a completion function for every unfinished instruction in 
the pipeline. Each completion function specifies the desired effect on the obser- 
vables of completing a particular unfinished instruction assuming those that are 
ahead of it (in the program order) are completed. The completion functions, 
which map an implementation state to an implementation state, leave all non- 
observable state components unchanged. However not every instruction in the 
pipeline gets executed completely and updates the observables. If an instruc- 
tion raises an exception or if the target address is mis-predicted for a branch 
instruction, then the instructions following it must be squashed. To specify this 
behavior, we define a squashing predicate for every unfinished instruction that is 
true exactly when the unfinished instruction can cause the subsequent instruc- 
tions (in the program order) to be squashed. The completion function for a given 
instruction updates the observables only if the instruction is not squashed by 
any of the instructions preceding it. 

We now elaborate on specifying the completion functions and the squashing 
predicates for the example under consideration. An unfinished instruction rbi 
in the processor can be in one of the following seven phases of execution: Issued 
to ABS Unit or to LS Unit {issued-ahs or issuedJsu), dispatched in either of 
these units {dispatched-ahs or dispatched Jsu) , executed in either of these units 
{executed.abs or executed Jsu) or written back to the reorder buffer (writtenback) . 
A given unfinished instruction is in one of these phases at any given time and the 
information about this instruction (the source values, destination register etc) 
is held in the various implementation components. For each instruction phase 
“ph” , we define a predicate “Instr_ph?” that is true when a given instruction is in 
phase “ph” , a function “Action.ph” that specifies what ought to be the effect of 
completing an instruction in that phase, and a predicate “Squashjrest?_ph” that 
specifies the conditions under which an instruction in that phase can squash all 
the subsequent instructions. We then define a single parameterized completion 
function and squashing predicate (applicable to all the unfinished instructions 
in the reorder buffer) as shown in [^. We similarly define (a parameterized) 
completion function for the committed store instructions in the store buffer. 
These store instructions can only be in a single phase, that is, committed, 
and they do not cause the subsequent instructions to be squashed. (A store 
instruction that raises an exception is not entered into the store buffer.) 



"/o state_I : impl . state type . rbindex : reorder buffer index type. | 1 

Complete_instr (q: state_I , rbi:rbindex, kill?:bool): state_I = 

IF kill? THEN q 

ELSIF Instr_writtenback?(q,rbi) THEN Action_writtenback(q,rbi) 

ELSIF Instr_executed_lsu?(q,rbi) THEN Action_executed_lsu(q,rbi) 

ELSIF . . . Similarly for other phases . . . ENDIF 

Squash_rest?_instr(q: state_I, rbi : rbindex) : bool = 

IF Instr_writtenback?(q,rbi) THEN Squash_rest?_writtenback(q,rbi) 

ELSIF Instr_executed_lsu? (q,rbi) THEN Squash_rest?_executed_lsu(q,rbi) 
ELSIF . . . Similarly for other phases . . . ENDIF 
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In this implementation, when an instruction is in the writtenback phase, 
its reorder buffer entry has the result value and destination of the instruction, 
and also enough information to determine whether it has raised any excepti- 
ons or has turned out to be a mis-predicted branch. Action_writtenback and 
Squash_rest?_writtenback are then defined using this information about the 
instruction. Similarly, we define the “Action” s and the “Squash_rest?”s for the 
other phases. When an instruction is in an execution phase where it has not 
yet read its operands, the completion function obtains the operands by sim- 
ply reading them from the observables. The justification is that the completion 
functions are composed in their program order in constructing the abstraction 
function (described below), and so we talk of completing a given instruction in 
a context where the instructions ahead of it are completed. 

"/o Complete_Squash_rest?_till returns a tuple. | 2 

"/o proj_l and proj_2 extracts the first and the second components. 

"/o rbindex_p is type ranging from 0 to the size of the reorder buffer. 
Complete_Squash_rest?_till(q: state_I ,rbi_ms :rbindex_p) : 

RECURSIVE [state_I,bool] = 

IF rbi_ms = 0 THEN (q, FALSE) 

ELSE LET t = Complete_Squash_rest?_till(q,rbi_ms-l) , 

X = proj_l(t), y = proj_2(t) IN 

(Complete_instr(x,measure_fn_rbi(q,rbi_ms) ,y) , 1st component. 
Squash_rest?_instr (x,measure_fn_rbi (q,rbi_ms) ) OR y) °/,’/. 2nd one. 
ENDIF 

MEASURE rbi_ms 

Complete_till(q: state_I ,rbi_ms :rbindex_p) : state_I = 
proj_l (Complete_Squash_rest?_till(Complete_committed_in_sb_till ( 

q, lsu_sb_commit_count (q) ) ,rbi_ms) ) 

Squash_rest?_till(q: state_I ,rbi_ms : rbindex_p) : bool = 
proj_2 (Complete_Squash_rest?_till(Complete_committed_in_sb_till ( 

q, lsu_sb_commit_count (q) ) ,rbi_ms) ) 

"/o state_A is the specification state type. 

ABS(q: state_I) : state_A = projection(Complete_till (q,rb_count (q) ) ) 

The final step is to construct the abstraction function (that has the cumula- 
tive effect of flushing the pipeline) by completing all the unfinished instructions 
in their program order. A given instruction is to be killed, that is, the kill? 
argument of Complete_instr is true, when the squashing predicate is true for 
any of the instructions ahead of that given instruction. In order to define an 
ordering among the instructions, we define a measure function rbi_measure_fn 
that associates a measure with every instruction in the reorder buffer such that 
the tail has measure one and successive instructions have a measure one greater 
than the previous instruction. So the instructions with lower measures occur ear- 
lier in the program order than instructions with higher measures. The function 
measure_f n_rbi returns the reorder buffer index of the instruction with the given 
measure. To define the abstraction function, we first define a recursive function 
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Complete_Squash_rest?_till that completes the instructions and computes the 
disjunction of the squashing predicates from the tail of the reorder buffer till a 
given unfinished instruction, as shown in [^. Complete_cominitted_in_sb_till 
is a similar recursive function that completes all the committed store instruc- 
tions in the store buffer. We can then define the abstraction function by first 
completing all the committed instructions in the store buffer (they are earlier 
in the program order that any instruction in the reorder buffer) and then com- 
pleting all the instructions in the reorder buffer. So we define Complete_till 
and Squash_rest?_till as shown in [^, and then in constructing the abstrac- 
tion function ABS, we instantiate the Complete_till definition with the mea- 
sure of the latest instruction in the reorder buffer. The implementation variable 
rb_count maintains the number of instructions in the reorder buffer, and hence 
corresponds to the measure of the latest instruction. 

3.3 Decomposing the Proof 

The proof of the commutativity obligation is split into different cases based on 
the structure of the synchronization function. In this example, the synchroniza- 
tion function returns zero when the processor is restarted or if the squashing 
predicate evaluates to true for any of the instructions in the reorder buffer (i.e., 
Squash_rest?_till(q,rb_count(q)) is true) or if no new instruction is issued. 
Otherwise it returns one, and we consider each of these cases separately. We 
discuss proving the commutativity obligation for register file rf and program 
counter pc only. The proofs for the special register file, mode flag and data me- 
mory are similar to that rf, though in the case of data memory, one needs to 
take into account the additional details regarding the committed instructions in 
the store buffer. The proof for instruction memory is straight-forward as it does 
not change at all. 

We first consider an easy case in the proof of the commutativity obligation 
(for rf), that is, when the processor is being restarted in the current cycle 
(restart_proc is true). 

— The processor discards all the executing instructions in the reorder buffer, 
and sets rb_count and lsu_sb_commit_count to zeros. So Complete_till 
will be vacuous on the implementation side of the commutativity obliga- 
tion (the side on which I_step(q,s,i) occurs), and the expression on the 
implementation side simplifies to rf (I_step (q, s , i) ) . 

— Whenever the processor is being restarted, the instruction at the tail of 
the reorder buffer is causing the rest of the instructions to be squashed, 
so Squash_rest?_till(q, 1) ought to be true. (Recall that the tail of the 
reorder buffer has measure one.) We prove this, and then from the definition 
of Complete_Squash_rest?_till in [^, it follows that the kill? argument 
is true for all the remaining instructions in the reorder buffer, and hence 
these do not affect rf . Also, the synchronization function returns zero when 
the processor is being restarted. So the expression on the specification side 
of the commutativity obligation simplifies to rf (Complete_till(q, 1) ) . 
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— We show that the rf (I_step(q, s , i) ) and rf (Complete_till (q, 1) ) are 
indeed equal by expanding the definitions occurring in them and simplifying. 

Now assume that restart_proc is false. We first postulate certain verifica- 
tion conditions, prove them, and then use them in proving the commutativity 
obligation. Consider an arbitrary instruction rbi. Though the processor execu- 
tes the instructions in an out-of-order manner, it commits the instructions to 
the observables only in their program order. This suggests that the effect on 
rf of completing all the instructions till rbi is the same in the states q and 
I_step(q,s,i). Similarly, the truth value of the disjunction of the squashing 
predicates till rbi is the same in the states q and I_step(q,s,i). This verifica- 
tion condition Complete_Squash_rest?_till_VC is shown in [^. This is proved 
by an induction on rbijns^ (the measure corresponding to instruction rbi). 

"/ valid_rb_entry? predicate tests whether rbi is within the | 3 

"/o reorder buffer bounds. 

Complete_Squash_rest?_till_VC : LEMMA 

FORALL(rbi_ms : rbindex) : LET rbi = measure_fn_rbi (q,rbi_ms) IN 
( (valid_rb_entry? (q,rbi) AND NOT restart_proc? (q, s , i) ) IMPLIES ( 
rf (Complete_till(q,rbi_ms)) = rf (Complete_till (I_step(q, s , i) ,rbi_ms) ) AND 
Squash_rest?_till(q,rbi_ms) = Squash_rest?_till(I_step(q, s , i) ,rbi_ms) ) ) 

As in the earlier proofs based on completion functions approach 
[HSG99,HSG98], we decompose the proof of Complete_Squash_rest?_till_VC 
into different cases based on how an instruction makes a transition from its 
present phase to its next phase. Figure 3 shows the phase transitions for an 
instruction rbi in the reorder buffer (when the processor is not restarted) where 
the predicates labeling the arcs define the conditions under which those transiti- 
ons take place. The Figure also shows the three transitions for a new instruction 
entering the processor pipeline. Having identified these predicates, we prove that 
those transitions indeed take place in the implementation machine. For example, 
we prove that an instruction rbi in phase dispatchedJsu (D_lsu in the Figure) 
goes to executedJsu phase in I_step(q,s,i) if Execute_lsu? predicate is true, 
otherwise it remains in dispatchedJsu phase. 

We now return to the proof of Complete_Squash_rest?_till_VC and consider 
the induction argument (i.e., rbi_ms is not equal to 1). The proof outline is as 
follows: 

— Expand the Complete_till and the Squash_rest?_till definitions on both 
sides of Complete_Squash_rest?_till_VC and unroll the recursive definition 
of Complete_Squash_rest?_till once. 

— Gonsider the first conjunct (i.e., one corresponding to rf). The kill? ar- 
gument to Complete_instr is Squash_rest?_till (q,rbi_ms-l) on the left 

® Since the measure function is dependent on the tail of the reorder buffer, and since 
the tail can change during an implementation transition, the measure needs to be 
adjusted on the right hand side of Complete_Squash_rest?_till_VC to refer to the 
same instruction. This is a detail which we ignore in this paper for the ease of 
explanation, and use just rbijns. 
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Fig. 3. The various phases an instruction can be in and transitions between them (when 
the processor is not being restarted). Also, the three transitions for an instruction 
entering the processor are shown. 



hand side and Squash_rest?_till (I_step(q, s , i) ,rbi_ms-l) on the right 
hand side, and these have the same truth value by the induction hypothesis. 
When it is true, the left hand side reduces to 

rf (Complete_till(q, rbi_ms — 1)) 

and the right hand side to 

rf (Complete_till(l_step(q, s, i), rbijns — 1)) 

which are equal by the induction hypothesis. When it is false, the proof 
proceeds as in our earlier work [HSG99]. We consider the possible phases 
rbi can be in and whether or not, it makes a transition to its next phase. 
Assume rbi is in dispatched-abs phase and the predicate Execute_abs? is 
true. Then, in I_step(q, s , i) , rbi is in executed-abs phase. By the definition 
of Complete_instr, the left hand side of the verification condition simpli- 
fies to rf (Action_dispatched_abs (Complete_till (q,rbi_ms-l) , rbijns) ) 
and the right hand side reduces to rf (Action_executed_abs (Complete_till 
(I_step(q, s , i) ,rbijns-l) , rbijns)). The proof now proceeds by expan- 
ding these “Action” function definitions, using the necessary invariant pro- 
perties and simplifying. The induction hypothesis will be used to infer that 
the register file contents in the two states Complete_till(q,rbijns-l) and 
Complete_till(I_step(q,s,i) ,rbijns-l) are equal, as those two terms 
appear when the “Action” definitions are expanded. Overall, the proof de- 
composes into 14 cases for the seven phases rbi can be in. 

— Consider the second conjunct of Complete_Squash_rest?_till_VC. Using 
the induction hypothesis, this reduces to showing that the two predicates 
Squash_rest?_instr(Complete_till(q,rbijns-l) , rbijns) and Squash_ 
rest?_instr (Complete_till (I_step(q, s , i) , rbijns- 1) , rbijns) have the 
same truth value. This proof again proceeds as before by a case analysis on 
the possible phases rbi can be in and whether or not, it makes a transition 
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to its next phase. The proof again decomposes into 14 cases for the seven 
phases rbi can be in. 

For the program counter, however, it is not possible to relate its value in 
states q and I_step(q,s,i) by considering the effect of instructions one at a 
time in their program order as was done for rf . This is because I_step updates 
pc if a new instruction is fetched, either by incrementing it or by updating 
it according to the speculated branch target address, but this new instruction 
is the latest one in the program order. However, if the squashing predicate is 
true for any of the executing instructions in the reorder buffer, then completing 
that instruction modifies the pc with a higher precedence, and the pc ought to 
be modified in the same way in both q and I_step(q, s , i) . This observation 
suggests a verification condition on pc, shown in [^. This verification condition 
is again proved by an induction on rbiuns, and its proof is decomposed into 14 
cases based on the instruction phase transitions as in the earlier proofs. 
pc_remains_same_VC : LEMMA | 4 

FORALL(rbi_ms : rbindex) : LET rbi = measure_fn_rbi (q,rbi_ms) IN 
(valid_rb_entry? (q,rbi) AND NOT restart_proc? (q, s , i) AND 
Squash_rest?_till (q,rbi_ms) ) IMPLIES 
pc(Complete_till(q,rbi_ms) ) = pc (Complete_till(I_step(q, s , i) ,rbi_ms) ) 

Now we come to the proof of the commutativity obligation, where we use the 
above lemmas after instantiating them with rb_count. We consider the diffe- 
rent remaining cases in the definition of the synchronization function in order — 
Squash_rest?_till(q,rb_count(q)) is true, no new instruction is issued or the 
three transitions corresponding to a new instruction being issued as shown in 
Figure 3. 

— When Squash_rest?_till(q,rb_count(q)) is true, the kill? argument for 
the new instruction fetched (if any) will be true in l_step(q, s , i) since 
Squash_rest?_till has the same truth value in states q and l_step(q, s , i) . 
Hence on the implementation side of the commutativity obligation, there is 
no new instruction executed. On the specification side, the synchronization 
function returns zero, so A_step_new is vacuous. The proof can then be ac- 
complished using Complete_Squash_rest?_till_VC (for the register file) and 
pc_remains_SEmie_VC (for the program counter). 

— The proof when no new instruction is issued or when one is issued is similar 
to the proof in our earlier work [HSG99]. For example, if the issued instruc- 
tion is in issuedJsu phase in l_step(q,s,i), then we have to prove that 
completing this instruction according to Action_issued_lsu has the same 
effect on the observables as executing a specification machine transition. 

Correctness of the feedback logic: Whenever there are data dependencies 
among the executing instructions, the implementation keeps track of them and 
forwards the results of the execution to all the waiting instructions. The correc- 
tness of this feedback logic, both for the register file and the data memory, is 
expressed in a similar form as in our earlier work [HSG99]. For example, a load 
instruction obtains the value from the store buffer if there is an entry with the 
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matching address (using associative search), otherwise it reads the value from 
the data memory. Consider the value obtained when all the instructions ahead 
of the load instruction are completed, and then the data memory is read. This 
value and the value returned by the feedback logic ought to be equal. The ve- 
rification condition for the correctness of the feedback logic for data memory is 
based on the above observation. It will be used in the proof of the commutativity 
obligation and the proof of this verification condition itself is based on certain 
invariants. 



Invariants needed: Many of the invariants needed like the exclusiveness and 
the exhaustiveness of instruction phases, and the invariants on the feedback 
logic for the register file and data memory are similar to the ones needed in our 
earlier work [HSG99]. We describe below one invariant that was not needed in 
the earlier work. 

The LS Unit executes the load and store instructions in their program or- 
der. These instructions are stored in their program order in the reservation sta- 
tions in the LS Unit and in the store buffer. It was necessary to use these facts 
during the proof and it was expressed as follows (for the reservation stations 
in LS Unit): Let rsil and rsl2 be two instructions in the reservation stati- 
ons in the LS Unit. rsil_ptr and rsl2_ptr point to the reorder buffer entries 
corresponding to these instructions respectively. Let lsu_rsi_measure_fn be a 
measure function defined on the LS Unit reservation station queue similar to 
rbi_measure_fn. If rsil has a lower/higher measure than rsi2 according to 
lsu_rsi_measure_fn, then rsil_ptr has a lower/higher measure than rsi2_ptr 
according to a rbi jmeasure_fn. 



PVS proof effort organization: This exercise was carried out in four phases. 
In the first phase, we “extrapolated” certain invariants and properties from the 
earlier work, and this took 27 person hours. In the second phase, we formulated 
and proved the invariants and certain other properties on the store buffer, and 
this took 54 person hours. In the third phase, we formulated and proved all the 
verification conditions about the observables and the commutativity obligation, 
and this took 131 person hours. In the fourth phase, we proved the necessary 
invariants about the feedback logic and its correctness, and this took 53 person 
hours. So the entire proof was accomplished in 265 person hours. 



Related work: There is one other reported work on formally verifying the cor- 
rectness of a pipelined processor of comparable complexity. In [SH99], Sawada 
and Hunt construct an explicit intermediate abstraction in the form of a table 
called MAETT, express invariant properties on this and prove the final correc- 
tness from these invariants. They report taking 15 person months. Also, their 
approach is applicable to fixed size instantiations of the design only. Various 
other approaches have been proposed to verify out-of-order execution processors 
recently [McM98,PA98,JSD98,BBCZ98,CLMK99,BGV99], but none of these 
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have been so far demonstrated on examples with a similar set of features as we 
have handled. 



4 Experimental Results and Concluding Remarks 



Example verified 


Effort spent doing the proof 


EXl.l and EXl. 2 


2 person months 


EX3. 1 


13 person days 


EX2. 1 


19 person days 


EX3.2 


7 person days 


EX2.2 


34 person days 



Table 1. Examples verified and the effort needed. 



We have applied our methodology to verify six example processors exhibiting 
a wide variety of implementation issues, and implemented our methodology in 
PVS [ORSvH95]. Our results to date are summarized in Table 1. This table 
summarizes the manual effort spent on each of the examples, listing them in the 
order we verified them. The first entry includes the time to learn PVS^. Each 
verification effort built on the earlier efforts, and reused some the ideas and the 
proof machinery. 

The processor described in this paper is listed as EX2.2. In contrast, EXl . 1 
is a five stage pipeline implementing a subset of the DLX architecture, EXl . 2 
is a dual-issue version of the same architecture, and EX2 . 1 is a processor with 
a reorder buffer and only arithmetic instructions. We also considered two ex- 
amples that allowed out-of-order completion of instructions: EX3 . 1 allowed cer- 
tain arithmetic instructions to bypass certain other arithmetic instructions when 
their destinations were different, and EX3 . 2 implemented Tomasulo’s algorithm 
without a reorder buffer and with arithmetic instructions only. 

In conclusion, the completion functions approach can be used effectively to 
verify a wide range of processors against their ISA-level specifications. We have 
articulated a systematic procedure by which a designer can formulate a very 
intuitive set of completion functions that help define the abstraction function, 
and then showed how such a construction of the abstraction function leads to 
decomposition of the proof of the commutativity obligation. We have also pre- 
sented how the designer can systematically address details such as exceptions 
and feedback logic. Design iterations are also greatly facilitated by the com- 
pletion functions approach due to the incremental nature of the verification, 
as changes to a pipeline stage do not cause ripple-effects of changes across the 
whole specification; global re-verification can be avoided because of the layered 

By the first author who did all the verification work. 



4 





536 R. Hosabettu, G. Gopalakrishnan, and M. Srivas 



nature of the verification conditions. Our future work will be directed at over- 
coming the current limitations of the completion functions approach, by seeking 
ways to automate invariant discovery, especially pertaining to the control logic 
of processors. 
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1 Introduction and Motivation 

For the foreseeable future, industrial hardware design will continue to use both 
simulation and model checking in the design verification process. To date, these 
techniques are applied in isolation using different tools and methodologies, and 
different formulations of the problem. This results in cumulative high cost and 
little (if any) cross-leverage of the individual advantages of simulation and formal 
verification. 

With the goal of effectively and advantageously exploiting the co-existence of 
simulation and model checking, we have developed a tool called FoCs (” Formal 
Checkers” ) . FoCs, implemented as an independent component of the RuleBase 
toolset, takes RCTL^ properties as input and translates them into VHDL pro- 
grams (’’checkers”) which are integrated into the simulation environment and 
monitor simulation on a cycle-by-cycle basis for violations of the property. 

Checkers, also called Functional Checkers, are not a new concept: manually- 
written checkers are a traditional part of simulation environments (cf. [CB-l-99]). 
Checkers facilitate massive random testing, because they automate test results 
analysis. Moreover, checkers facilitate the analysis of intermediate results, and 
therefore save debugging effort by identifying problems directly - ”as they hap- 
pen”, and by pointing more accurately to the source of the problems. 

However, the manual writing and maintenance of checkers is a notoriously 
high-cost and labor-intensive effort, especially if the properties to be verified are 
complex temporal ones. For instance, in the case of a checker for a design with 
overlapping transactions (explained in Section 3), writing a checker manually is 
an excruciating error-prone effort. 

Observing the inefficient process of manual checker writing in ongoing IBM 
projects has inspired the development of FoCs, as a means for automatically 
generating checkers from formal specifications. For each property of the specifi- 
cation, represented as an RCTL formula, FoCs generates a checker for simulation. 
This checker essentially implements a state machine which will enter an error 

^ RCTL includes a rich and useful set of CTL safety formulas and regular expressions, 
see [BBL98] 
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state in a simulation run if the formula fails to hold in this run. The next section 
will describe the checker generation process in more detail. 

Experience with FoCs in multiple projects has been very favorable in terms of 
verification cost and quality. Verification effort is reduced by leveraging the same 
formal rules for model checking of small design blocks as well as for simulation 
analysis across all higher simulation levels. An equally important benefit of FoCs 
is the conciseness and expressiveness of RCTL formulas. Formulas consisting of 
just a few lines can efficiently represent complex and subtle cases, which would 
require many lines of code if described in a language such as VHDL. This makes 
maintenance, debugging, porting and reuse of specifications and checkers highly 
cost-effective. 

2 Tool Architecture and Implementation 

Figure 1 shows the overall environment in which FoCs operates. The user pro- 
vides a design to be verified, as well as formal specifications and a set of test 
programs generated either manually or automatically. FoCs translates the formal 
specification into checkers, which are then linked to the design and simulated 
with it. During simulation, the checker produces indications of property viola- 
tions. It is up to the user to decide what action to take: fix the design, the 
property, or the simulation environment. 




Fig. 1. FoCs Environment 

FoCs translates RCTL into VHDL as follows: First, each property is trans- 
lated into a non-deterministic finite automaton (NFA) and a simple AG{p) for- 
mula, where p is a Boolean expression. The NFA has a set of distinguished error 
states, and the formula specifies that the NFA will never enter an error state 
(entering an error state means that the design does not adhere to the specifica- 
tion under the test conditions) . The translation details are described in [BBL98] . 

Since contemporary simulators do not support non-determinism, the NFA 
has to be converted into a deterministic automaton (DFA). The DFA, in turn, 
is translated into a VHDL process - the FoCs checker. The AG{p) formula is 
translated into a VHDL Assert{p) statement that prints a message when the 
VHDL process reaches a state where the underlying property is violated, and 
possibly stops simulation. 
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The number of states of the DFA may be exponential in the number of 
states of the NFA, but simulation is sensitive to the size of the representation 
(the number of VHDL lines) rather than to the number of states. The number 
of VHDL lines in the resulting VHDL checker is at most quadratic in the size 
of the property. Practically, it is almost always linear because of the types of 
properties that people tend to write. 

The above translation process is implemented within the RuleBase model 
checker [BBEL96]. 

3 Example 

The following example will demonstrate the conciseness and ease of use of RCTL 
for checker writing and the advantage of automatic checker generation. Assume 
that the following property is part of the specification: If a transaction that 
starts with tag t has to send k bytes, then at the end of the transaction 
k bytes have been sent. The user can formulate this property in RCTL as 
follows: 

forall k: ^ 

{*, start & start Jag=t & tosend=k, tend*, end & endAag=t\ — >■ {sent=k} 
Manual writing of a checker for this property may become complicated if transac- 
tions may overlap, which means that a new transaction may start while previous 
transactions are still active. The checker writer has to take into consideration 
all possible combinations of intervals and perform non-trivial bookkeeping. The 
RCTL formula is evidently much more concise and readable than the resulting 
VHDL file or a manually written VHDL or C program. 



4 Using FoCs for Coverage Analysis 

The quality of simulation-based verification depends not only on thorough check- 
ing of results, but also on the quality of the tests used (a.k.a. input patterns or 
test vectors) [KN96]. FoCs checkers can serve to enhance the quality of tests by 
providing a means for measuring test coverage. Test coverage measurement is the 
process of recording if certain user-defined events occurred during simulation of a 
set of tests. When used for coverage purposes, the FoCs checkers will evaluate the 
quality of the test suite by discovering events, or scenarios, that never happen 
during simulation. This feedback will guide the user which further tests are 
needed in order to cover scenarios that have not been exercised. 

The implementation of coverage checkers is similar to that of functional 
checkers. The only difference is that instead of reporting an error, a coverage 
checker provides a positive indication when covering the relevant scenario. An 

^ The semantics of forall are intuitive. The implementation, however, is not trivial. 
It involves spawning a new automaton whenever a new value of k is encountered. 
Neither the formal semantics nor the implementation are included here due to lack 
of space. 
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example of a scenario to cover is: a snoop event happens twice between 
read and write, which can be formulated in RCTL as follows: 

{*, read, {!snoop*, snoop, Isnoop*, snoop} & {Iwrite*} } — 1 {cover} 

A special case of coverage analysis is detection of simulation vacuity. While 
vacuity in model checking is defined as a failure of a subformula to influence 
the model checking results [BBER97], simulation vacuity refers to the failure of 
a set of tests to trigger a functional checker, which means that the checker did 
not influence simulation results. To detect simulation vacuity, FoCs attaches a 
coverage checker to each functional checker it generates; the coverage checker 
indicates whether the functional checker was triggered in at least one simulation 
run. 

5 Experience 

The FoCs toolset has been deployed in several projects in IBM, notably in the 
GigaHertz processor development effort in Austin and in the IBM Haifa ASIC 
development laboratory. FoCs has also been successfully used by the formal 
verification team of Galileo Technology, Inc. The experience with FoCs in these 
projects has been very favorable in terms of verification cost and quality. Using 
FoCs, verification effort was reduced - reportedly by up to 50% - by using the 
same formal rules for model checking at the unit level and for simulation analysis 
in the subsystem and system levels. This reduction was achieved despite the fact 
that the addition of checkers increases simulation time considerably (up to a 
factor of two) . Thousands of FoCs checkers were written so far by virtue of the 
great ease of writing RCTL specifications and translating them to FoCs checkers. 



6 Related Work 

In a previous work, Kaufmann et al [KMP98] described an approach to verifi- 
cation in which all specifications and assumptions have the form AG{p), where 
p is a Boolean formula. Both specifications and assumptions of this form can 
be used in simulation, with specification being translated into simple checkers, 
and assumptions being translated into testbench code which avoids leading the 
simulation into undesired states. 

Canfield et al [CES97] describe a platform called Sherlock, aiming to serve 
both model checking (through translation to CTL) and simulation. Sherlock in- 
cludes a high level language for specifying reactive behavior, while we use RCTL 
- a regular expression based language. While Sherlock postprocesses simulation 
traces, our method works during simulation. No experience has been described 
in [CES97]. Our own experience has demonstrated that the simple, concise syn- 
tax and semantics of RCTL, coupled with online simulation checking, is highly 
useful to the verification teams with whom we work. 
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Finally, in [SB+97], Schlipf et al describe a methodology and tool that inspi- 
red our work. They unify formal verification and simulation efforts by automa- 
tically translating state machines which represent the environment specification 
either to simulation behavioral models or to input for a model checker. Boo- 
lean assertions attached to the state machines serve as AG{p) formulas in model 
checking or Assert{p) in simulation. In contrast, our solution represents the spe- 
cification as temporal logic formulas rather than state machines, for the sake of 
conciseness and readability. 

7 Future Plans 

Although focused on checker generation for functional testing and coverage ana- 
lysis, we view FoCs as a step towards a full methodology of ’’Formal Specification, 
Design and Verification” . We intend to provide a set of integrated, complemen- 
tary tools that will facilitate the use of formal specification for multiple purposes. 
Once written, the formal specification will: 

- Serve as an executable specification; architects can experiment with it while 
defining the specification 

- Be used as a golden model to resolve ambiguities and misunderstandings during 
the implementation stage 

- Be translated into temporal formulas for model checking 

- Be translated into simulation checkers 

- Be used for derivation of coverage criteria 

- Provide hints for automatic test generation 

We believe that such a methodology, once supported by the appropriate tools, 
will significantly contribute to the quality and efficiency of the design process. 
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1 Introduction 

Formal validation of distributed systems relies on several specification formalisms 
(such as the international standards LOTOS [15] or SDL [16]), and it requires different 
kinds of tools to cover the whole development process. Presently, a wide range of 
tools are available, either commercial or academic ones, but none of them fulfills in 
itself all the practical needs. 

Gommercial tools (like ObjectGEODE [20], SDT [1], STATEMATE [14], etc.) provide 
several development facilities, like editing, code generation and testing. However, 
they are usually restricted to basic verification techniques (exhaustive simulation, 
deadlock detection, etc) and are “closed” in the sense that there are only limited 
possibilities to interface them with others. On the other hand, there exist many 
academic tools (like smv [19], hytech [12], kronos [22], uppaal [18], spin [13], 
INVEST [2], etc.) offering a broad spectrum of quite efficient verification facilities 
(symbolic verification, on-the-fly verification, abstraction techniques, etc.), but of- 
ten supporting only low-level input languages. This may restrict their use at an 
industrial scale. 

This situation motivated the development of if, an intermediate representation for 
timed asynchronous systems together with an open validation environment. This 
environment fulfills several requirements. First of all, it is able to support different 
validation techniques, from interactive simulation to automatic property checking, 
together with test case and executable code generation. Indeed, all these functio- 
nalities cannot be embodied in a single tool and only tool integration facilities can 
provide all of them. For a sake of efficiency, this environment supports several le- 
vels of program representations. For instance it is well-known that model-checking 
verification of real life case studies usually needs to combine different optimiza- 
tion techniques to overcome the state explosion problem. In particular, some of 
these techniques rely on a syntactic level representation (like static analysis and 
computations of abstractions) whereas others techniques operate on the underlying 
semantic level. Another important feature is to keep this environment open and 
evolutive. Therefore, tool connections are performed by sharing either input/output 
formats, or libraries of components. For this purpose several well-defined application 
programming interfaces (apis) are provided. 

* Work partially supported by Region Rhone- Alpes, France 
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2 Architecture 



The IF validation environment relies on three levels of program representation: the 
specification level, the if intermediate level, and the lts semantic model level. Fi- 
gure 1 describes the overall architecture and the connections between the toolbox 
components. 




Fig. 1. An open validation environment for if 



The specification level is the initial program description, expressed for instance 
using an existing language. To be processed, this description is (automatically) 
translated into its if representation. Currently the main input specification forma- 
lism we consider is SDL, but connections with other languages such as LOTOS or 
PROMELA could also be possible. 

The intermediate level corresponds to the if representation [7]. In if, a system 
is expressed by a set of parallel processes communicating either asynchronously 
through a set of buffers, or synchronously through a set of gates. Processes are based 
on timed automata with deadlines [3], extended with discrete variables. Process 
transitions are guarded commands consisting of synchronous/ asynchronous inputs 
and outputs, variable assignments, and clock settings. Buffers have various queuing 
policies (fifo, stack, bag, etc.), can be bounded or unbounded, and reliable or lossy. 

A well-defined API allows to consult and modify the abstract tree of the if repre- 
sentation. Since all the variables, clocks, buffers and the communication structure 
are still explicit, high-level transformations based on static analysis (such as live va- 
riables computation) or program abstraction can be applied. Moreover, this API is 
also well suited to implement translators from if to other specification formalisms. 
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The semantic model level gives access to the lts representing the behaviour of 
the IF program. Depending on the application considered, three kinds of API are 
proposed: 

• The implicit enumerative representation consists in a set of C functions and 
data structures allowing to compute on demand the successors of a given state 
(following the OPEN-CAESAR [11] philosophy). This piece of C code is generated 
by the if2c compiler, and it can be linked with a “generic” exploration program 
performing on-the-fly analysis. 

• In the symbolic representation sets of states and transitions of the lts are ex- 
pressed by their characteristic predicates over a set of finite variables. These pre- 
dicates are implemented using decision diagrams (beds). Existing applications 
based on this API are symbolic model-checking and minimal model generation. 

• Finally, the explicit enumerative representation simply consists in an lts file 
with an associated access library. Although such an explicit representation is 
not suitable for handling large systems globally, it is still useful in practice to 
minimize some of its abstractions with respect to bisimulation based relations. 

3 Components Description 

We briefly present here the main components of the environment, together with 
some external tools for which a strong connection exists. 

The specification level components. O&jectGEODE [20] is a commercial toolset 
developed by TTT supporting SDL, MSC and OMT. In particular, this toolset pro- 
vides an API to access the abstract tree generated from an SDL specification. We 
have used this API to implement the SDl2if translator, which generates operatio- 
nally equivalent if specifications from SDL ones. Given the static nature of if, this 
translation does not cover the dynamical features of SDL (e.g., process instances 
creation) . 

The intermediate level components, live [5] implements several algorithms 
based on static analysis to transform an if specification. A first transformation 
concerns dead variable resetting (a variable is dead at some control point if its value 
is not used before being redefined). This optimisation can be also applied to buffer 
contents (a message parameter is dead if its value is not used when the message is 
consumed) . Although very simple, such optimisation is particularly efficient for state 
space generation (reductions up to a factor 100 were frequently observed), while 
preserving the exact behaviour of the original specification. A second transformation 
is based on the slicing technique [21]. It allows to automatically abstract a given 
specification by eliminating some irrelevant parts w.r.t. a given property or test 
purpose [6]. 

if2pml [4] is a tool developed at Eindhoven TU to translate if specifications into 
PROMELA. 

The semantic model level components. CADP [9] is a toolset for the verification 
of LOTOS specifications. It is developed by the VASY team of Inria Rhone- Alpes 
and VERIMAG. Two of its model-checkers are connected to the if environment: 
ALDEBARAN (bisimulation based), and evaluator (alternating- free ^-calculus). 




546 M. Bozga et al. 



For both tools, diagnostic sequences are computed on the lts level and they can be 
translated back into MSC to be observed at the specification level. 

KRONOS [22] is a model-checker for symbolic verification of tctl formulae on com- 
municating timed automata. The current connection with the if environment is as 
follows: control states and discrete variables are expressed using the implicit enume- 
rative representation, whereas clocks are expressed using a symbolic representation 
(particular polyhedra) . 

TGV [10] is a test sequence generator for conformance testing of distributed systems 
(joint work between verimag and the pampa project of irisa). Test cases are 
computed during the exploration of the model and they are selected by means of 
test purposes. 

4 Results and Perspectives 

The IF environment has already been used to analyze some representative SDL speci- 
fications, like SSCOP, an Atm signalisation layer protocol [8], and MASCARA, an Atm 
wireless transport protocol. It is currently used in several on going industrial case- 
studies, including the real-time multicast protocol PGM, and the control part of the 
Ariane 5 launcher flight sequencer. The benefits of combining several techniques, 
working at different program level, were clearly demonstrated. In particular, tradi- 
tional model-checking techniques (as provided by ObjectGEODE) were not sufficient 
to complete on these large size examples. 

Several directions can be investigated to improve this environment. 

First of all, other formalisms than SDL could be connected to if. In particular, the 
translation from a subset of UML is envisaged. To this purpose new features will be 
added to handle dynamic process creation and parametrized network specifications. 
From the verification point of view, the results obtained using the currently imple- 
mented static analysis techniques are very encouraging. We now plan to experiment 
some more sophisticated algorithms implemented in the invest tool [2], such as 
structural invariant generation and general abstraction computation techniques for 
infinite space systems. 

Another promising approach to verify large systems consists in generating their 
underlying model in a compositional way: each sub-system is generated in isolation, 
and the resulting ltss are minimized before being composed with each other. The if 
environment offers all the required components to experiment it in an asynchronous 
framework [17]. 

The IF package can be downloaded at http : / /www-verimag . imag . f r/DIST_SYS/IF. 
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There is a growing trend to integrate theorem proving systems with specia- 
lized decision procedures and model checking systems. The proving capabilities 
of the PVS theorem prover, for example, have been improved considerably by 
extending it with new proof tactics based on a BDD package, a ^-calculus model- 
checker [4], and a polyhedral library. In this way, a theorem proving system like 
PVS provides a common front-end and specification language for a variety of spe- 
cialized tools. This makes it possible to use a whole arsenal of verification and 
validation methods in a seamless way, combine them using a strategy language, 
and provide development chain analysis. 

Here we describe a novel PVS tactic for deciding an interesting fragment of 
PVS that corresponds to the Weak Second-order Theory of 1 Successor, WSIS. 
This logic may not only be viewed as a highly succinct alternative to the use 
of regular expressions, but can also be used to encode Presburger arithmetic or 
quantified boolean logic. The decidability of WSIS is based on the fact that regu- 
lar languages may be characterized by logics. However, this automata-theoretic 
procedure is of staggering complexity, namely non-elementary. 

Although this logic-automaton connection has been known for more than 40 
years, it was only through the recent work at BRIGS that it become possible to 
make effective use of automata-based decision procedures for logics like WSIS. 
Their tool, called MONA [2] , acts as a decision procedure and as a translator to 
finite-state automata. It is based on new algorithms for minimizing finite-state 
automata using binary decision diagrams (BDDs) to represent transition func- 
tions in compressed form. Various applications of MONA — including hardware 
verification, validation of software design constraints, and establishing safety and 
liveness conditions for distributed systems — are mentioned in [2]. 

We are using the efficient automata-construction capabilities of MONA for 
building a tactic that decides a fragment of the PVS specification language. 
This fragment includes boolean expressions, arithmetic on the natural numbers 
restricted to addition/subtraction with/from a constant, and operations on finite 
sets over the naturals like union, intersection, set difference, addition and removal 
of a natural. Predicates include arithmetic comparisons, equality, disequality, 
the subset relation, and membership in the form of function application P(i). 

* This research was funded by DARPA AO D855 under US Air Force Rome Laboratory 
contract F30602-96-C-0204 and by the National Science Foundation Contract No. 
CCR-9712383. 
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Moreover, there is quantification over the booleans, the natural numbers, finite 
sets of naturals, and predicate subtypes of the aforementioned types built from 
WSIS formulas. Finite sets of natural numbers may also be described using the 
choice operator the. In this way, ripple-carry addition may be defined as follows. 



+ (P, qI f inite_set [nat] ) : f inite_set [nat] = | 1 

the({R I EXISTS (C: f inite_set [nat] ) : N0T(C(0)) AND 
FORALL (t: nat): 

(C(t+1) = ((P(t)&q(t)) DR (P(t)&C(t)) OR (Q(t)feC(t)))) & 
(R(t) = P(t) = q(t) = C(t))}); 



Here a natural number k is mapped to a set of indices corresponding to I’s in the 
binary representation of k; e.g., 10 is represented as {1,3}. Usually, functions are 
encoded in a relational style in WSIS, but the inclusion of the choice operator 
allows one to provide functional representation. Notice that the carry vector C 
is hidden by means of second-order quantification. 

This fragment of the PVS language is being decided by associating an au- 
tomata with every formula. This translation proceeds in two steps. First, defi- 
nitions are unfolded to transform formulas into the language of WSIS. In this 
step, the full arsenal of theorem proving capabilities of PVS, including decision 
procedures, rewriting, and lemma application, may be used to simplify the re- 
sulting formula. The second step of the translation traverses this formula and 
recursively builds up a corresponding automata that recognizes the language of 
interpretations of the PVS formula. Here we use foreign function calls to directly 
call the C functions of the MONA library [2] from the Lisp implementation of 
PVS. 

Moreover, we use facilities provided by the Allegro Lisp garbage collector 
to also take the automaton and HDD data structures of MONA into account. 
In this way, we create a transparent and functional view of the MONA capa- 
bilities. This makes it possible, for example, to memoize the translation pro- 
cess. In addition, the translation of PVS formulas to deterministic finite auto- 
mata includes abstraction. Whenever the translator encounters a PVS expres- 
sion outside the scope of WSIS, it creates a new variable — of type nat, bool, 
or f inite_set [nat] , depending on the type of the expression — and replaces the 
original expression with this variable throughout the formula. This abstraction 
increases the scope of our proof procedure. When the expression to be abstracted 
away includes bound variables, however, the translator gives up. Abstraction has 
been found particularly useful in the analysis of state machines, where the state 
usually consists of a record of state components and the specification includes 
accesses to these components that can easily be abstracted. This translation has 
been packaged as a new PVS tactic. 



stamps: LEMMA | 2 

FORALL (i: f inite_set [nat] ) : i >= [| 8 |] => 

EXISTS ( j , k: f inite_set [nat] ) : i = 3*j+5*k 
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Itl: THEORY 
BEGIN 

t ime : TYPE = nat ; 

tpred: TYPE = f inite_set [time] 

t, tl : VAR time 
P, Q, R: VAR tpred 

[] (P) (t) : boolean = FQRALL (tl: upfrom[t]): P(tl); 

<>(P)(t): boolean = EXISTS (tl: upfrom[t]): P(tl); 

| = (P): boolean = P(0); 7, Validity 

WF(A, EA, I: tpred): tpred = ([]<>(A & NOT(I))) DR [] <>(N0T(EA)) ; 

SF(A, EA, I: tpred): tpred = ([]<>(A & NOT(I))) DR <>[](N0T(EA)); 

END Itl 



Fig. 1. Encoding of a Linear Temporal Logic 



Consider, for example, the Presburger formula stamps, where * is defined by 
iterating addition as defined above, and [ I k I ] is a recursively defined function 
for computing a finite set that represents the unsigned representation of k.^ 
This inductive property is shown to be valid by simply calling the (wsls) tactic 
within the PVS prover, since, after unfolding and unwinding all definitions, in- 
cluding the recursive ones, the associated automata recognizes the full language 
of interpretations. Most proof attempts, however, fail. 



(EXISTS 


(x: 


nat): x = 100 & P!l(x)) 


1 3 


& (EXISTS 


(x: 


nat): x = 111 & P!l(x)) 




& (FORALL 


(i: 


nat): i /= 100 & i /= 111 => NOT P!l(i)) 





This formula is neither valid nor unsatisfiable. Thus, the tactic (wsls) returns 
with a counterexample (P ! 1 = emptyset [nat] ) and a witness (P ! 1 = add(lll , 
add(100, emptyset [nat] )) ) for the free variable P! 1. 

Using the approach outlined above we have encoded various theories in 
WSlS: Presburger arithmetic, lossy queues, regular expressions, a restricted li- 
near temporal logic (LTL), and fixed-sized bit vectors. The complete encoding of 
a LTL, including definitions for weak and strong fairness is shown in Figure 1. 
Boolean connectives do not have to be defined explicitly for this logic, since the 
conversion mechanism of PVS [3] automatically lifts the built-in logic connectors 
to the type tpred. Regular expressions are represented by means of a datatype, 
and a recursively defined meaning function associates the WSlS defined set of 

^ Notice that [| k |] is not a WSlS arithmetic relation, because otherwise one 
could define addition directly on the natural numbers; but this is outside the scope 
of WSlS. 
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words recognized by this regular expression; hereby, a word is of type [below [N] 
-> f inite_set [nat] ] , where N specifies an alphabet of size expt (2 ,N) . 

Notice that we automatically have a decision procedure for the combination 
of the encoded theories mentioned above, since all encodings are definitional 
extensions of the base language of WSIS. Moreover, the list of encoded theories 
is open-ended. 

Among other applications we have been using the (wsls) tactic for dischar- 
ging verification conditions that have been generated using the abstraction me- 
thod described in [1]. These examples usually involve a combination of the lossy 
queue theory and regular expressions. Also, quantifier reasoning and Skolemiza- 
tion has to be used to transform the formulas under consideration into the 
fragment supported by the translation process. For example, universal-strength 
quantification over words needs Skolemization, since words are elements of a 
third-order type (see above). After preprocessing and unfolding the definitions, 
verification conditions are typically proved within a fraction of a second. 

Altogether, the WSIS decision procedures enrich PVS by providing new au- 
tomated proving capabilities, an alternative approach for combining decision 
procedures, and a method for generating counterexamples from failed proof at- 
tempts. On the other hand, this integration provides an appealing frontend and 
input language for the MONA tool, and permits using automata-theoretic deci- 
sion procedures in conjunction with both traditional theorem proving techniques 
and specialized symbolic analysis like abstraction. This connection, however, 
needs further exploration. 

The WSIS decision procedure has been integrated with PVS 2.3, which has 
been released in fall ’99. PVS 2.3 is freely available at pvs.csl.sri.com. 

Acknowledgements. We would like to thank A. Mpller for clarifying discus- 
sions about MONA internals, M. Sorea for comments on this paper, and S. 
Bensalem for providing interesting test cases. 
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Abstract. We describe here the Pet (standing for path exploration 
tool) system, developed in Bell Labs. This new tool allows an interactive 
testing of sequential or concurrent programs, using techniques taken from 
deductive program verification. It automatically generates and displays 
a graphical representation of the flow graph, and links the visual repre- 
sentation to the code. Testing is done by selecting execution paths, or, 
in the case of concurrent programs, interleaved sequences of code. The 
Pet system calculates the exact condition to execute path being selec- 
ted, in terms of the program variables. It also calculates (when possible) 
whether this condition is vacuous (never satished) or universal (always 
satisfied). The user can then edit the path and select variants of it by 
either extending it, truncating it, or switching the order of appearance 
of concurrent events. This testing approach is not limited to hnite state 
systems, and hence can be used in cases where a completely automatic 
verification cannot be applied. 



1 Introduction 

Software testing is the most commonly used method for enhancing the quality 
of computer software. Testing is done usually in a rather informal way, such as 
by walking through the code or inspecting the various potential pitfalls of the 
program [1]. Methods that are more formal, such as model checking or deductive 
theorem proving, which have been used very successfully for hardware verifica- 
tion, have not succeeded to gain superiority in the area of software reliability. 
Deductive verification of actual software systems is very time consuming, while 
model checking suffers from the state space explosion problem, and is, in most 
cases, restricted to the handling finite state systems. 

Concurrent programs may exhibit complicated interactions that can make 
debugging and testing them a difficult task. We describe here a tool that helps 
the user to test sequential or concurrent software using a graphical interface. 
It allows the user to walk through the code by selecting execution paths from 
the flow graph of the program. The most general relation between the program 
variables that is necessary in order to execute the selected path is calculated 
and reported back to the user. An attempt is made to decide using the path 
condition whether the path is at all executable. The user can edit the execution 
paths by adding, truncating and exchanging (in case of a concurrent program) 
the order of the transitions. 

The Pet tool is based on symbolic computation ideas taken from program 
verification. It allows more general ways of debugging programs than simulating 
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one execution of the checked code at a time, as each path of the flow graph 
corresponds to all the executions that are consistent with it. 

The input language to current implementation of Pet is Pascal, extended 
with communication and synchronization constructs for concurrent program- 
ming. However, Pascal is just one possible choice. The main principles, on which 
this tool is based, can be used with other formalisms. In fact, it is quite easy to 
change the input language from Pascal to, e.g., C, SDL or VHDL. 

Part of the problem in software testing and verification is coping with scala- 
bility. The Pet tool also contains an abstraction algorithm, which can be applied 
during the compilation of the code. The algorithm attempts to abstract out cer- 
tain variables and presents a projection of the program. That is, the obtained 
program is a simplified version in which some of the variables are abstracted 
away. This produces a simplified version of the program, allowing the user to 
better understand certain aspects of the code. 

2 Pet: Path Exploration Tool 

A flow graph is a visual representation of a program. In the Pet system [2], 
a node in a flow graph is one of the following: begin, end, predicate, random, 
wait, assign, send or receive. The begin and end nodes appear as ovals, the 
predicate, wait and random nodes appear as diamonds, labeled by a condition, 
or the word random, in the latter case. All other nodes appear as boxes labeled 
by the assignment, send or receive statement. 

Each node, together with its output edge constitutes a transition, i.e., an 
atomic operation of the program, which can depend on some condition (e.g., 
the current program counter, an if-then-else or a while condition in the node, 
the nonemptiness of a communication queue) and make some changes to the 
program variables (including message queues and program counters). Notice 
that a predicate node corresponds to a pair of transitions: one with the predicate 
holding (corresponding to the ‘yes’ outedge), and one with the predicate not 
holding (corresponding to the ‘no’ outedge). 

Unit testing [1] is based on examining paths. Different coverage techniques 
suggest criteria for the appropriate coverage of a program by different paths. 
Our tool leaves the choice of the paths to the user (a future version will allow 
a semi-automatic choice of the paths which uses various coverage algorithm in 
order to suggest the path selection, e.g., based on the coverage techniques in [1, 
4]). The user can choose a path by clicking on the appropriate nodes on the flow 
graph. 

In order to make the connection between the code, the flow chart and the 
selected path clearer, sensitive highlighting is used. For example, when the cursor 
points at some predicate node in the flow graph window, the corresponding text 
is highlighted in the process window. The code corresponding to a predicate 
node can be, e.g., an if-then-else or a while condition. 

Once a path is flxed, the condition to execute it is calculated, based on 
repeated symbolic calculation of preconditions, as in program verification [3]. 
The condition is calculated backwards, starting with true. Thus, we proceed from 
a postcondition of a transition, in order to calculate its precondition. In order to 
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calculate the precondition given the transition and the postcondition, we apply 
various transformations to the current condition, until we arrive to the beginning 
of the paths. For a transition that consists of a predicate p with the ‘yes’ outedge, 
we transform the current condition from c to c/\p. The same predicate with the 
‘no’ outedge, results in c A -'p. For an assignment of the form x := e, we replace 
in p every (free) occurrence of the variable x in the postcondition c by the 
expression e. We start the calculation with the postcondition true at the end of 
the selected path. Other kinds of transitions will be discussed later. 

Pet then allows altering the path by removing nodes from the end, in rever- 
sed order to their prior selection, or by appending new nodes. This allows, for 
example, the selection of an alternative choice for a condition (after the nodes 
that were chosen past that predicate nodes are removed). Another way to alter 
a path is to use the same transitions but allow a different interleaving of them. 
When dealing with concurrent programs, the way the execution of transitions 
from different nodes are interleaved is perhaps the foremost source of errors. The 
Pet tool allows the user to flip the order of adjacent transitions on the path, 
when they belong to different processes. 

The most important information that is provided by Pet is the condition to 
execute a selected path. The meaning of the calculated path condition is diffe- 
rent for sequential and concurrent or nondeterministic programs. In a sequential 
deterministic program, the condition expresses exactly the possible assignments 
that would ensure executing the selected path, starting from the first selected 
node. When concurrency or nondeterminism are allowed, because of possible al- 
ternative interleavings of the transitions or alternative nondeterministic choices, 
the condition expresses the assignments that would make the execution of the 
selected path possible. The path condition obtained in this process is simpli- 
fied using rewriting rules, based on arithmetic. Subexpressions that contain only 
integer arithmetic without multiplication (Pressburger arithmetic) are further 
simplified using decision procedures (see [2]). In this case, we can also check 
algorithmically whether the path condition is equivalent to false (meaning that 
this path can never be executed), or to true. 

In order to allow testing of communication protocols, one needs to add send 
and receive communication operations. Our choice is that of asynchronous com- 
munication, as its use seems to be more frequently used. The syntax of Pascal 
is then extended with two types of transitions: 



ch ! exp The calculated value of the expression exp is added to the end of the 
communication queue ch. (we will henceforce assume the queues are not 
limited to any particular capacity). 

ch?var The first item of the communication queue ch is removed and assigned 
to the variable var. This transition cannot be executed if the queue ch is 
empty. 

Typical communication protocols would allow a concurrent process to wait 
for the first communication arriving from one out of multiple available commu- 
nications. The random construct represents nondeterministic choice and can be 
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used for that. An example for a choice of one out of three communications is as 
follows: 

if random then 

if rauidom then ch7?x 
else ch2?z 
else ch3?t 

For the communication constraints, the translation algorithm scans the path 
in the forward direction. Whenever a send transition of the form ch! exp occurs, 
it introduces a new temporary variable, say temp, and replaces the transition 
by the assignment temp : =exp. It also adds temp to a queue, named as the com- 
munication channel ch. When a receive transition ch?var occurs, the oldest 
element temp in the queue ch is removed, and the receive transition is replaced 
by var : =temp. This translation produces a path that is equivalent to the original 
one in the case that all the queues were empty prior to the execution of the path. 
We can easily generalize this to allow the case where there are values already in 
the queues when the execution of the path begins. 



Transition 


queue 


transformed 


condition 


Pl:x:=3 


ch=() 




false 


Pl:chl ! x+y 


ch=(templ) 


tempi : =x+y 


X > 3 


Pl:chl ! X 


ch=(templ, temp2) 


temp2 : =x 


X > 3 


Pl:x:=4 


ch=(templ, temp2) 




temp2 > 3 


P2: chl?t 


ch=(temp2) 


t : =templ 


temp2 > 3 


P2: chl?z 


ch=() 


z : =temp2 


temp2 > 3 


P2: z>3 


ch=0 




z > 3 


P2: z:=z+l 


ch=0 




true 



Fig. 1. A path with its calculated condition 



In Figure 1, the replacement is applied to a path with communication tran- 
sitions. The first column describes the path. The second column denotes the 
(single in this example) queue used to facilitate the translation. The third co- 
lumn denotes the translated transitions (it is left empty in the cases where the 
translation maintain the original transition). The last column gives the calcula- 
ted path condition (calculated backwards). 

The user can select to project out a set of program variables. Pet checks if 
there are assignments to other variables that use any of the projected variables. 
If there are, it reports to the user which additional variables need to be projected 
out. The projection algorithm removes the assignments to the projected variables 
and replaces predicates that use them by a nondeterministic choice. 
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1 Introduction 



In earlier work, Necula and Lee developed proof- carrying code (PCC) [3,5], which 
is a mechanism for ensuring the safe behavior of programs. In PCC, a program 
contains both the code and an encoding of an easy-to-check proof. The validity 
of the proof, which can be automatically determined by a simple proof-checking 
program, implies that the code, when executed, will behave safely according to 
a user-supplied formal definition of safe behavior. Later, Necula and Lee demon- 
strated the concept of a certifying compiler [6,7]. Certifying compilers promise 
to make PCC more practical by compiling high-level source programs into op- 
timized PCC binaries completely automatically, as opposed to depending on 
semi-automatic theorem-proving techniques. Taken together, PCC and certify- 
ing compilers provide a possible solution to the code safety problem, even in 
applications involving mobile code [4]. 

In this paper we describe a PCC architecture comprising two tools: 

— A thin PCC layer implemented in C that protects a host system from unsafe 
software. The host system can be anything from a desktop computer down 
to a smartcard. The administrator of the host system specifies a safety policy 
in a variant of the Edinburgh Logical Framework (LF) [1]. This layer loads 
PCC binaries, which are Intel x86 object files that contain a . If section 
providing a binary encoding of a safety proof, and checks them against the 
safety policy before installing the software. 

— A software-development tool that produces x86 PCC binaries from Java 
.class files. It is implemented in Objective Caml [2]. From a developer’s 
perspective, this tool works just like any other compiler, with an interface 
similar to javac or gcc. Behind the scenes, the tool produces x86 machine 
code along with a proof of type safety according to the Java typing rules. 

The demonstration will use a small graphics program to show that this archi- 
tecture delivers Java safety guarantees without sacrificing the performance of 
native compilation. 
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2 Architecture 

Figure 1 shows the architecture of our PCC system. The right side of the figure 
shows the software-development process, and the left side shows the secure host 
system. VC stands for “verification condition” . 




Host Code Producer 



Fig. 1. The architecture of our PCC implementation. 



The right side of Figure 1 is a code-production tool that the user runs offline 
to generate Intel x86 PCC binaries from Java bytecode. First, a compiler genera- 
tes x86 native code from a Java . class file. This compiler is largely conventional 
except that it attaches some logical annotations to the resulting binary. These 
annotations are “hints” that help the rest of the system understand how the 
native code output corresponds to the bytecode input. These annotations are 
easy for the compiler to generate from the bytecode, but would be difficult for 
the rest of the system to “reverse engineer” from unannotated native code. 

The annotated x86 binary is then analyzed by a tool called a verification- 
condition generator, or VC generator. The VC generator is parameterized by 
a set of axioms and rules, specified in the Edinburgh Logical Framework (LF), 
that describe the safety policy against which the binary must be proven. The 
VC generator outputs a logical predicate, also expressed in LF, that describes a 
precondition that, if true, would imply that any possible execution of the binary 
is safe. Intuitively speaking, it performs this task by scanning each native-code 
instruction and emitting safety conditions as they arise. The result is called the 
verification condition, or VC. 

Finally, the VC is sent to an automated theorem prover, which, using the 
same axioms and rules, attempts to prove the VC and, if successful, outputs the 
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resulting logical proof in binary form. The annotations and proof are added to 
the binary as an . If segment, thus producing a PCC binary. This object file can 
be loaded and linked with existing tools just like any other object file, in which 
case the . If section is ignored. Currently, proofs are 10-40% of the code size, 
but preliminary results with a new proof-representation technology indicate that 
this can be decreased to 5-10% [8]. 

Now we turn to the host on the left side of Figure 1. The host runs a thin 
PCC layer that loads PCC binaries and verifies them. The host first separates the 
annotated binary from the proof. In Figure 1, these are shown already separated. 
It then runs a VC generator on the annotated binary to produce a VC from a 
safety policy specified by the same set of rules and axioms.^ Lastly, it checks 
the proof to make sure that it is indeed a valid proof under the safety policy. If 
it checks out, then the unannotated binary is installed on the host system. The 
annotations and the proof are discarded. 

The PCC layer on the host illustrates another key engineering point that 
makes the PCC architecture viable — checking a proof is usually much faster and 
simpler than generating a proof. It is important that it is fast because it is 
happening dynamically, as software is downloaded into the host. It is important 
that it is simple because the trusted computing base (TCB) on the host must be 
bug-free. Furthermore, some of our current applications involve small embedded 
systems that lack memory resources to run large programs. For these reasons, it 
is unacceptable to run a complex verifier on the host system. 

In contrast, PCC development tools such as the code producer on the right 
side of Figure 1 can be slow and buggy without compromising the soundness of 
the basic architecture that we are proposing in this demonstration. For instance, 
the compiler is free to perform aggressive optimizations, even though that might 
present difficult and time-consuming problems for the proof generator, because 
proof generation is done offline. Technically, this result is provided by a soundness 
theorem [6]. This result is not specific to any particular code-production tool, 
to Java, or to any particular safety policy, but rather is a property of the PCC 
architecture itself. 

3 Demonstration 

We have implemented prototypes of both sides of Figure 1 for the Intel x86 ar- 
chitecture. The certifying compiler performs register allocation and some global 
optimizations, including a form of partial redundancy elimination. The VC ge- 
nerator and proof checker are quite complete and have been stable for several 
months. The compiler and proof generator, on the other hand, are still under 
heavy development. At present, the compiler handles a large subset of the Java 

^ The system that we demonstrate shares the same VC generator between the two 
sides of the architecture, and this is often a convenient approach. Conceptually, 
however, the host’s module is part of the TCB and thus must be bug-free, while the 
code producer’s module is not trusted, and so bugs will simply produce incorrect 
binaries that will be caught by the host before installation. 
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features, including objects, exceptions, and floating-point arithmetic. However, 
there are several key features that have yet to be implemented, including threads 
and dynamic class loading. Also, a number of important optimizations are not 
yet finished, including the elimination of null-pointer and array-bounds checks, 
and the stack allocation of non-escaping objects. 

The demonstration will use a small graphics program to show that this ar- 
chitecture delivers Java safety guarantees without sacrificing the performance of 
native compilation. The demonstration will compare three different approaches 
to transmitting untrusted code to a secure host: 

1. Java bytecode, verified and run by a JVM on the host (safe but slow) 

2. x86 native code produced by a C compiler, run via the Java Native Interface 
(JNI) on the host (fast but unsafe) 

3. x86 native code produced by our certifying compiler, run via JNI on the host 
(fast and safe) 

We demonstrate the safety of our approach (3) by showing that various forms of 
tampering with the PCC binary cause the host to reject the binary as potentially 
unsafe. 

In future work, we plan to release our current system for public use. Also of 
great interest is to extend the safety policy to go beyond Java type safety, in 
particular to allow enforcement of some constraints on the use of resources such 
as execution time and memory. 
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Abstract. The Statemate Verification Environment supports requirement ana- 
lysis and specification development of embedded controllers as part of the State- 
mate product offering of 1-Logix, Inc. This paper discusses key enhancements of 
the prototype tool reported in [2,5] in order to enable full scale industrial usage of 
the tool-set. It thus reports on a successfully completed technology transfer from 
a prototype tool-set to a commercial offering. The discussed enhancements are 
substantiated with performance results all taken from real industrial applications 
of leading companies in automotive and avionics. 



1 Introduction 

This paper reports on a successful technology transfer, taking a prototype verification 
system for Statemate [2,5] to a version available as commercial product offering. It is 
only through the significant improvements reported in this paper that key companies are 
now taking up this technology in an industrial setting. In particular, other approaches 
such as [15,14], though similar in overall goal, did as of today not reach a comparable 
level neither in quality of handling nor degree of captured complexity. 

The leading players in automotive and avionics base their product developments on 
well established and highly matured process models, typically variations of the well 
known V-model originally introduced as a standard for military avionics applications. 
E.g. [16] reports on the development process and process improvements of Aerospatiale, 
documenting the benefits of a model based development process. Overall product qua- 
lity is critically dependent on the familiarity of system and software designers with the 
established process, and any change, in particular the introduction of a technology com- 
pletely novel to designers, can potentially cause significant process degradation, thus 
leading to quality reduction rather than quality enhancements. It is thus essential to tune 
layout and handling to use-cases well understood and easily appreciated by designers. 
Particular causes of concern identified in numerous discussions wifh industrial partners 
are: 



- scariness of formal methods: With a typical background in mechanical or elec- 
trical engineering, it is prohibitive to expose any of the underlying mathematical 
machinery to designers. We tried to maintain as much of the “look and feel” of 
the simulation capabilities of the tool- set, offering in particular pure push button 
analysis techniques described in Section 2, which completely hide the underlying 
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verification technology. We spent a significant effort in providing a tight integration 
with Statemate for all verification related activities, as described in Section 4. 

- relation to testing: A standard topic of discussion is the relation to testing, which 
typically causes confusion. The techniques are complementary, but model- checking 
can take over a significant part traditionally handled by testing. In terms of the V- 
diagram, the so-called virtual V supports model based integration of components 
of subsystems, and model-checking based verification can replace all model-based 
testing (if clear interface specifications are given), as well as capture a large share of 
errors typically detected today in integration testing. We found that test engineers not 
only accept but appreciate a transition, in which the traditional activity of designing 
test sequences for properties captured only informally or even mentally is replaced 
by the process of capturing these in forms of requirement patterns (provided as 
library), leaving construction of test-sequences to the model-checker. 

- user models: Test engineers constitute only one out of a number of identified classes 
of users. While fhere are differences in company culfures, we feel that a second pro- 
minent use-case of model-checking rests in its capabilities as a powerful debugging 
tool, simply reducing the number of iterations until stable models are achieved, and 
thus reducing development costs. 

Of slightly different nature, though similar in spirit, is the introduction of a metho- 
dology for supporting successive refinements of abstractions. We allow to selectively 
“degrade” accuracy, with which the model-checker tracks computations on selected ob- 
jects within the cone-of-influence for a given property. In particular, all tests dependent 
on floats are over approximated, while for objects with finite domain a more refined view 
of compufafions can be mainfained (see Section 3). This technique is complemented by 
a symbolic evaluation engine, whose underlying machinery rests on a simplified, but re- 
stricted version of the first-order model-checking algorithm documented in [1] — see [1] 
for a description of the use of these enhancement on automotive applications. 

While the prototype system reported in [2,5] was based on the SVE system [10], 
the current version rests on a tight integration with the VIS model-checker [11] and 
the CUDD BDD package [17], yielding in average a five time performance boost for 
model-generation times and a 50% speed-up for model-checking. 
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and Rainer Schlor, with Peter Jansen (BMW), Joachim Hofmann and Andreas Nagele 
(DaimlerChrysler - DC), and Geoffrey Clements (British Aerospace - BAe), with the 
VIS Group, in particular Roderick Bloem, for supporting the integration, Eabio Somenzi 
for the CUDD BDD package and I-Logix, in particular Moshe Cohen. 



2 Model Analysis 

The Statemate semantics [12] contains modeling techniques which enables simul- 
taneous activation of conflicting transitions or nondeterministic resolution of multiple 
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write accesses to data-items. Even if these mechanisms are captured by the Statemate 
semantics, such properties prove to be errors in later design processes, where the design 
is mapped to more concrete levels of abstraction, for instance, by using code-generation 
facilities. 

The current version of the verification environment has been extended by push-button 
analysis for Statemate designs to be able to verify if some robustness properties are 
fulfilled by the system under design. These debugging facilities cover 

a) simultaneous activation of conflicting transitions, 

b) several write accesses to a single data-item in the same step', and 

c) parallel read- and write-accesses to the same object. 

Besides these robustness checks, which are completely automated, the verification en- 
vironment offers simple reachability mechanisms to drive the simulation to some user 
provided state or property, for example, a state where some specific atomic proposition 
holds. Such analysis can be used to verify, for instance, that states indicating fatal errors 
are not reachable. In the context of testing, reachability related analysis can be used for 
achieving simulation prefixes. 

These use cases require — in contrast to verification activities related to quality ac- 
ceptance gates for complete Electronic Control Units (ECUs) or subsystems — a white 
box view of the underlying model, allowing to refer to states and transitions of the model, 
typically asking for state-invariants. 



Implementation and Results. Model analysis is performed using symbolic model- 
checking as described in Section 1. In order to use model-checking for model analysis, 
the finite state machine describing the model’s behavior is extended automatically by 
observers, which allow to specify robustness properties as atomic propositions p. These 
propositions are then checked using simple AG(p) formulae, stating that nothing bad 
ever happens. If there is a path leading to some state a with a ^ -■p, then that path is a 
witness for erroneous modeling. This path can be used to drive the Statemate simulator 
as described in [2,4]. 

Model analysis has been successfully applied to industrial sized applications like 
a central locking system (BMW; c.f. [7]), a main component of a stores management 
system of an aircraft (BAe; c.f. [2]), and an application of another leading european car 
manufacturer. Each application consists of up to 300 state-variables and 80 input-bits. 
In every model, all types of robustness failures were detected, consuming 300 seconds 
of analysis time in average on an UltraSparc II, 277MHz, equipped with 1GB of RAM. 
Note that these results have been achieved on top level designs. If these facilities will — as 
intended — be used while developing a design, model analysis will be performed much 
faster as it will be applied mainly to subcomponents of a complete system. 



* Within Statemate, WAV-races are reported disregarding the values to be written simulta- 
neously. Besides such an analysis, the verification environment also provides an enhanced 
WAV-race detection, also taking into account the values to be assigned, since designers often 
considered WAV-races wrt. equal values to be not harmful. 
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3 Propositional Abstraction 

Several different approaches have been proposed in the literature to overcome the state 
explosion problem, which is inherent even to symbolic model checking. Besides tech- 
niques for compositional reasoning, the current version of the Statemate verification 
environment provides a simple but powerful abstraction technique, which is offered to 
the user as special verification method for component proofs. This abstraction method 
improves the approach described in [2] . 

State of the art model checkers like VIS [11] compute the cone-of-influence (COI) 
of a model A4 regarding a requirement (p before verihcation is initiated. The COI of AI 
contains those variables of AI, which may influence the truth value of p. As the COI 
preserves the exact behavior of Ai regarding p, it potentially still contains very complex 
computations for its variables — this complexity often remains too high for successful 
verihcation. On the other hand, the COI provides useful information about dependencies 
of objects within Ai when verifying p. 

Even if the COI contains all variables which may influence requirement 4>, some of 
them can safely be omitted if (p must hold for any value of these variables. If the exact 
valuation for some variable x is not required to verify p, also all computations for its 
concrete representation can be omitted. The propositional abstraction supported in the 
Statemate verihcation environment provides a mechanism to automatically compute 
an over- approximation Aia of Ai wrt. a user selected set of variables from within the 
COI. Every computation required to build Alo is performed on a higher level language 
SMI [3] , which serves as intermediate format in the Statemate verihcation environment. 
Note that also the COI can be calculated on that level. Since it is not required to build Ai 
to obtain Ala, the abstraction technique becomes suitable also for models containing 
inhnite objects. Abstracted variables are eliminated and only their inhuence on other 
objects is maintained in Ala- 

Let a; be a variable to be abstracted. Then, each condition b occurring in Ai is repla- 
ced by 3x.bP Assignments to x are eliminated, and if x is referenced in a right hand side 
of an assignment y := t{. . . ,x, . . .), the assignment is replaced by y := yJnp, where 
yAnp is a fresh input ranging over the domain of y}. The model checking problem 
becomes simpler on the abstract model, as any computations regarding x are not further 
in use. 

The verihcation environment offers an application methodology for propositional ab- 
straction, which is based upon an iterative scheme: 

1 ) compute cone-of-inhuence of Ai regarding p, 

2) select set of variables to be abstracted from Ai and compute Aia, 

a) if Ala \= P, the requirement also holds for Ai, 

b) if verihcation fails due to complexity or a (user dehned) timeout occurs for Ala, 
set Ai : =Aia and go to 1), 

c) otherwise, analyze error-path to identify needed increase of accuracy. 

* Since SMI does not provide existential quantification, this is approximated by replacing positive 
occurrences of propositions referring to x by true, while negative ones are replaced hy false. 

^ [2] reports other possible types of computations for Aia These types differ in the remaining 
degree of information about some abstract variable a: in At o- 
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Results. The abstraction technique and its application methodology have been comple- 
tely integrated into the verification environment. The abstraction technique itself proves 
to be powerful for industrial applications. Verification of some non-trivial safety re- 
quirements regarding a BAe application (stores management system) with up to 1200 
state-bits became possible using this type of abstraction within a couple of seconds. The 
number of state-bits dropped to 160 in the abstract model. Propositional abstraction has 
also been successfully applied to a DC application [13], which originally contained four 
32-Bit integer and two real variables. The automatically abstracted version used only 
80 state-bits and 7800 BDD nodes and model-checking some relevant safety properties 
was performed successfully within 60 seconds. 

4 Integration 

The tool-set for Statemate verification is enhanced with a graphical interface which 
significantly eases its use. All verification related activities can be performed by graphical 
operations. Compiling the design to the internal representation for the verification [4] as 
well as model analysis described in Section 2 can be initiated directly from appropriate 
icons. 

The verification environment provides a behavioral view for each activity of the Sta- 
temate design. This allows to apply model analysis on each activity separately. Results 
are recorded in a report. Witness paths found by the analysis operations are translated into 
simulation control programs [4], which can be used to drive the Statemate simulator. 

Verification of specific properties of the design under consideration is supported by 
a graphical specification formalism integrated in the user interface. For each activity of 
the design the user can state requirements using predefined specification patterns, which 
can also be employed to express assumptions about the environment of an activity. 
Typical properties for verification are offered as a library of patterns, which can easily 
be instantiated in customized specifications. 

The user interface ensures automatic translation of requirements into temporal logic 
formulae [9], while assumptions about the environment of an activity are compiled into 
observer automata. Verification is done by adding the observers for the assumptions 
plus fairness constraints to the model [6] and performing regular model checking using 
the CTL model checker VIS. This process is managed by the user interface, hiding all 
control aspects from the user. 

Creation of proof obligations and execution of proof tasks is integrated in the user 
interface. The interface keeps track of verification results. Proof-results are automatically 
invalidated by changes in the design or modifications of the specification. A fine granular 
dependency management ensures minimal invalidation. Proofs can be established again 
wrt. the changes by re-executing the affected proof tasks. Witness sequences are collected 
and managed by the environment. The user can easily animate runs of the activity 
violating requirements by using the Statemate simulator. Additionally, a violation can 
be displayed as set of waveforms. 

The user interface offers propositional abstraction for each proof task. The abstraction 
iterations discussed in Section 3 are recorded and automatically reapplied when the proof 
task needs to be re-executed, for example, if some changes are made to the model. 

Compositional reasoning is guided by the user interface as described in [2] . Specifica- 
tions of sub-components can be used hierarchically to derive specifications of composed 
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activities of the design. Such structural implications are maintained wrt. the verification 
results for the sub-activities. 



5 Conclusion 

We have described key enhancements taking a prototype verification system for State- 
mate to a verification environment now available as commercial offering from I-Logix. 
As part of the ongoing cooperation, further extensions of the technology are under de- 
velopment, including in particular support for industrially relevant classes of hybrid 
controller models and support for a recently developed extension of Message Sequence 
Charts called Live Sequence Charts [8]. Within the SafeAir project, the same verification 
technology is linked to other modeling tools used by our partners Aerospatiale, DASA, 
Israeli Aircraft Industries, and SNECMA. In cooperation with BMW and DC we are 
integrating further modeling tools. 
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1 Introduction 

In recent years, a number of cryptographic protocols have been mechanically 
verified using a variety of inductive methods (e.g., [4,3,5]). These proofs typically 
require defining a number of recursive sets of messages, and require deep insight 
into why the protocol is correct. As a result, these proofs often require days to 
weeks of expert effort. 

We have developed an automatic verifier, TAPS, that seems to overcome 
these problems for many cryptographic protocols. TAPS uses the protocol text 
to construct a number of first-order invariants; the proof obligations justifying 
these invariants, along with any user-specified protocol properties (e.g. message 
authentication), are proved from the invariants with a resolution theorem prover. 
The only flexibility in constructing these invariants is to guess, for each type of 
nonce^ and encryption generated by the protocol, a formula capturing conditions 
necessary for that nonce/encryption to be published (i.e., sent in the clear). 
TAPS chooses these formulas heuristically, attempting to match the designer’s 
intent as expressed in traditional protocol notations (for example, the choice 
can be influenced by the formally irrelevant use of the same variable name in 
different transitions). When necessary, the user can override these choices, but 
TAPS usually needs these hints only for recursive protocols and certain types of 
nested encryptions. Justifying the invariants usually requires substantial protocol 
reasoning; proving the invariants in a single simultaneous induction is critical to 
making the proofs work. 

TAPS has verified properties of about 60 protocols, including (variants of) 
all but 3 protocols from the Clark & Jacob survey [1]. 90% of these protocols 
require no hints from the user; the remainder an average about 40 bytes of 
user input (usually a single formula). The average verification time for these 
protocols is under 4 seconds (on a 266MHz PC), and TAPS verifications seem 
to require about an order of magnitude less user time than equivalent Isabelle 
verifications. Although TAPS cannot generate counterexamples, it can quickly 
verify many protocols without the artificial limitations on protocol or state space 
required by model checking approaches. 

For a more complete description of TAPS, see [2]. 

^ We use “nonce” to describe any freshly generated unguessable value (e.g. including 
session keys). 



E.A. Emerson and A.P. Sistla (Eds.): CAV 2000, LNCS 1855, pp. 568—571, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




TAPS: A First-Order Verifier for Cryptographic Protocols 569 



Protocol NeedhamSchroederLowe /* 0.7 sec */ 

/* k(X) = X’s public key, dk(k(X)) <=> X has been compromised */ 
Definitions { 

mO = {A,Na}_k(B) ml = {B,Na,Nb>_k(A) m2 = {Nb>_k(B) 

> 



Transitions { 
/* A->B */ 
/* B->A */ 
/* A->B */ 
/* B */ 
/* oopsNa*/ 
/* oopsNb*/ 



Na: pub (A) /\ pub(B) -p0-> mO 

Nb: pub(B) /\ pub(mO) -pl-> ml 

pO /\ pub (ml) -p2-> m2 

pi /\ pub (m2) -p3-> {}■ 

pO /\ dk(k(A)) -oopsNa-> Na 

pi /\ dk(k(B)) -oopsNb-> Nb 



Axioms { k injective }■ 

Goals { /* If either A or B has executed his last step and neither is 
compromised, then his partner has executed the preceding 
step, with agreement on A,B,Na,Nb */ 

p2 => pi \/ dk(k(A)) \/ dk(k(B)) 
p3 => p2 \/ dk(k(A)) \/ dk(k(B)) 

}■ 



Fig. 1. TAPS input for the Needham-Schroeder-Lowe protocol 



2 The Protocol Model 

Figure 1 shows TAPS input for the Needham-Schroeder-Lowe (NSL) protocol. 
Each protocol makes use of an underlying set of messages whose structure is given 
by a first-order theory. Identifiers starting with uppercase letters {A,B,Na ,. . . ) 
are first-order variables, ranging over messages; the remainder are first-order 
functions (fc), first-order predicates, history predicates (p0,pl,p2,p3), and the 
unary predicate pub. The message theory includes the list constructors nil and 
cons, enc (encryption), and the predicates atom (unary) and d {d{X,Y) means 
that messages encrypted using X can be decrypted using V), as well as any 
functions mentioned in the protocol (like k in the example). The first-order 
theory says that nil, cons, and enc are injective, with disjoint ranges, and do 
not yield atoms. The user can provide arbitrary first-order axioms in the Axioms 
section. Lists in braces are right-associated cons lists, and _ is infix encryption 
(e.g., {Nb}_k(B) abbreviates enc{k{B) , cons{Nb , nil))) . 

Each protocol defines an (infinite state) transition system. The state of the 
system is given by interpretations assigned to the history predicates and pub. 
These interpretations grow monotonically, so any positive formula (one in which 
history predicates and pub occur only with positive polarity) is guaranteed to be 
stable (once true, it remains true). The abbreviation dk{X) (“X is a decryptable 
key”) is defined by dk{X) (3 V : d{X,Y) A pub{Y)). 

The transitions of the protocol are each of the form 

P . ^/l 



nvp : gp 



p 
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where p is a history predicate, nvp is an optional list of variables (the nonce 
variables of p), gp is a positive formula, and Mp is a message term. For each p, 
TAPS generates a minimal signature (list of distinct variables) Sp that includes 
all the free variables in nvp, gp, and Mp; used as a predicate, p abbreviates 
p(Ap)^. For example, in NSL, SpQ = SoopsNa = A,B,Na, and Spi = Sp 2 = 
'S'ps = BoopsNb = A, B, Na, Nb. 

A transition describes two separate atomic actions. In the first action, the 
system (1) chooses arbitrary values for all variables in Ap, such that variables 
in nvp are assigned fresh, distinct atoms; (2) checks that gp holds in the current 
state; (3) adds the tuple (Ap) to the interpretation of p, and (4) checks that all 
the axioms hold. In the second action, the system (I) chooses an arbitrary tuple 
from the interpretation of p, (2) publishes the corresponding message Mp, and 
(3) checks that the axioms hold. Execution starts in the state where all history 
predicates have the empty interpretation, no messages are published, and all the 
axioms hold. 

In addition, each protocol implicitly includes transitions modeling the ability 
of the spy to generate (and publish) new messages; these transitions generate 
fresh atoms, tuple, untuple, and encrypt previously published messages, and 
decrypt previously published messages encrypted under decryptable keys. 

3 Generating the Invariants 

To generate the invariants, TAPS has to choose a formula Lv for each nonce 
variable v (giving conditions under which a freshly generated v atom might 
be published) and a formula for each encryption subterm of each Mp (giving 
conditions under which the subterm might be published). The user can influence 
these choices by providing formulas for some of the Lv’s or providing explicit 
labels for some of the subterms of the Mp’s. TAPS calculates these formulas as 
follows. Let S initially be the set of all formulas p(Ap) ofc(Mp), and let T 
initially be the empty set; TAPS repeatedly 

— replaces a formula f ofc(cons(X, Y)) in S with f ofc(X) and f ofc(Y); 

— removes a formula f ok{nil) from S'; 

— replaces a formula f ofc(X) in S, where X is explicitly labeled by the user 
with the formula g, with f ^ g and g ^ ofc(X); 

— replaces a formula f ofc(enc(X, Y)) in S with f AdA:(X) ofc(Y) and adds 
f primeEnc{enc{X,Y)) to T 

For example, applying this procedure to the p2 transition of NSL has the net 
effect of adding the formulap2A(ifc(fc(A)) ^ ok{Nb) to S and adding the formula 
p2 primeEnc{m2) to T. 

When this process has terminated, TAPS defines Lv to be the disjunction of 
all formulas f for which f ofc(v) is a formula of S (unless the user has defined 

^ TAPS provides an explicit substitution operator to allow history predicates to take 
arbitrary arguments. Default arguments make large protocols much easier to read 
and modify. 
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Lv explicitly) , and defines primeEnc to be the strongest predicate satisfying the 
formulas of T (universally quantified). For example, for NSL, TAPS defines 

Lva ^ (pO A dk{k{B))) V (pi A dk{k{A))) V oopsNa 
Ljv& (pi A dk{k{A))) V (p2 A dk{k{B))) V oopsNb 
primeEnc(X) 

(3 A, B, Na, Nh:{X = mQ^ pO) V (X = ml A pi) V (X = m2 A p2)) 

TAPS then proposes the following invariants (in addition to the axioms of the 
underlying first-order theory, and any axioms specified in the Axioms section): 

(1) P(^p) gp A (Vv : V G nvp : otom(v)) 

(2) p(T'p) A p(A’p') A V = v' ^ Ifp = i7p' 

(3) p('S'p) A q(T'q) ^ V w for distinct v G Ep, w G A'q 

(4) pub{X) ok{X) 

(5) ok{nil) 

(6) ok(cons(X,Y)) ok{X) A ok{Y) 

(7) ok{enc{X, F)) primeEnc{enc{X , Y)) V (pub{X) A pub{Y)) 

(8) P(^p) ^ (ofc(v) (3 y : U)) for v G nvp 

where V is the set of free variables of Lv not in Ep 

These formulas say (1) each history predicate implies its corresponding guard, 
and nonce variables are instantiated to atoms; (2)-(3) no atom is used more that 
once as the instantiation of a nonce variable; (4) all published messages are ok; 
(5)-(6) a tuple is ok iff each of its components is ok; (7) an encryption is ok iff it 
is either a primeEnc or the encryption of published messages; and (8) an atom 
used to instantiate a nonce variable v is ok iff Ly holds. 

The remaining formulas of S (universally quantified) are left as proof obli- 
gations; if these formulas follow from the invariants, then the invariants hold in 
all reachable states. The invariants are then used to prove the goals. 
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1 Introduction 

Asynchronous circuit design can probably avoid the occurrence of various pro- 
blems which arise in designing large synchronous circuits, such as clock skews 
and high power consumption. On the other hand, the cost of the verification of 
asynchronous circuits is usually much higher than that of synchronous circuits. 
This is because every change of wires should be taken into account in order to 
capture the behavior of asynchronous circuits unlike in the case of synchronous 
circuits. Furthermore, asynchronous circuit designers have recently preferred to 
use timed circuits for implementing fast and compact circuits. This trend in- 
creases the cost of verification, and at the same time increases the demands for 
formal verification tools. VINAS-P is our newest formal verification tool for ti- 
med asynchronous circuits using the techniques proposed in [1]. The main idea 
in these techniques is the partial order reduction based on the timed version [2] 
of the Stubborn set method [3] . 

This short paper mainly introduces what VINAS-P can verify and how we 
use it. We are planning to release VINAS-P on our web site soon. 

2 What Can VINAS-P Verify? 

In order to formally verify timed circuits, VINAS-P uses the timed version [I] of 
the trace theoretic verification method [4], and as its internal model, time Petri 
nets are used. Thus, its implementation and specification are both modeled by 
time Petri nets, and it produces the result whether the implementation conforms 
to the specification or not. Since the time Petri nets for an implementation are 
automatically generated from a Verilog-like description and the VINAS-P gate 
library, users must usually handle a time Petri net only for the specification. 
The properties to be verified by the current version of VINAS-P are safety pro- 
perties. That is, it checks whether a failure state where the circuit produces an 
output that the specification does not expect (i.e., the specification is not ready 
for accepting the output) is reachable or not. Any causality relation between 
input and output wires can be expressed in the specifications. Checking liveness 
properties (e.g., that some output is actually produced) will be supported in the 
future. 

Here, we will demonstrate verification using VINAS-P with a simple example. 
Consider the circuit shown in Figure 1(a), which is a two-stage asynchronous 
FIFO. The gate with a “C” symbol is actually implemented as shown in (b). 
To describe this circuit, we prepare a Verilog-like description as shown in (c). 
We do not have to describe primitive gates such as ANDs or ORs, because they 
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(a) 



(b) 



module FIFO; 

fifo_cell celll (a,b,wl ,w2,w3, ack) ; 
fifo_cell cell2(w2,w3,req,x,y ,wl) ; 
initial 
begin 

#0 x=l; ack=l; req=l; w2@Cl@cell2- 
end 

endmodule 



module celm(z,c,d); 

and2 [1,2] ANDl (wl , c ,d) ; 
and2 [1,2] AND2(w2 , c ,z) ; 
and2 [1,2] AND3(w3,d,z) ; 
or3 [1,2] 0R(z ,wl , w2 , w3) ; 
1 ; endmodul 



module f if o_cell(inl , in2 ,req,outl , out 2, ack) ; 
celm Cl (outl ,req, ini) ; 

celm C2(out2,req,in2) ; 

nor2 [1,2] N0R(ack, outl , out2) ; 
endmodule , n 



Fig. 1. A circuit to be verified. 



are included in the gate library. A module celm is for the “C” gate. This gate 
produces the value v only when v is given to both inputs, otherwise, it holds the 
previous value. The delay values (e.g., [1,2]) are specified for each primitive gate. 
VINAS-P assumes the bounded delay model, and these delay values represent 
the minimal and maximal delays. 

Suppose that this circuit is used in the following environment. Either (0,1) 
or (1,0) and then (0,0) is given to (a,b) periodically, say, every 12 time units. 
However, due to clock skew, up to one time unit inaccuracy can occur. For req, 
0 is given when (0,1) or (1,0) is given to {a,b), and 1 for (0,0). What we want 
to verify is that this FIFO never causes overflow, that is, that ack is always 
activated (i.e., 0 for (0,1) or (1,0), and 1 for (0,0)) before the next data is 
given to (a, b). In order to describe both the environment and this property, we 
prepare a time Petri net as shown in Figure 2(a). Some transitions have timing 
information like [p, g] . It means that the transition must have been enabled for 
p time units or more before its firing, and that it must fire before q time units 
have passed unless it is disabled. Thus, the transitions labeled tO and tl fire 
exactly every 12 time units. [0,1] in some transitions models the effect of the 
clock skew. The transitions which are related to the input wires of the circuit 
are labeled “(in)”, and the transitions related to the output wires are labeled 
“(out)”. The names of these transitions end with “-I-” or The firings of “-I-” 
(“— ”) transitions correspond to 0 — >■ 1 (1 — >■ 0) signal transitions. The firings of 
these transitions are synchronized with the changes of the corresponding wires of 
the circuit. Since the output wires are controlled by the circuit, the specification 
cannot specify any timing information to those transitions. It is assumed that 
they have [0,oo]. The transitions without “(in)” or “(out)” are not related to 




574 



T. Yoneda 




(a) 



(b) 



Fig. 2. A specification and a failure trace. 

any wires, and are called internal transitions. The transitions t2 and t3 which 
conflict with the transitions for ack— or ack+ are for checking the properties 
to be verified. These transitions become enabled when the change of a or & is 
triggered. If the circuit changes the ack before the brings of these transitions, 
then it is the correct behavior of the circuit (no overflow occurs). On the other 
hand, if the ack is not changed for 12 time units, it means that the overflow 
can be caused by the next data. In this case, t2 (or t3) fires and the transitions 
for the ack become disabled. Thus, when the circuit eventually produces ack, 
the situation in which the circuit produces an output that the specification does 
not expect occurs. VINAS-P detects this as a failure. It is straightforward to see 
that this time Petri net expresses both the behavior of the environment of the 
circuit and the expected behavior of the circuit itself. All we should do for this 
verification is to push the “verify” button and see the results. If the circuit does 
not conform to the specification (actually it does not), the failure trace shown 
in Figure 2(b) is displayed. 

The major advantage of VINAS-P is that verification can be performed with- 
out traversing the entire state space of the given system. This is possible by 
the partial order reduction technique, and it often reduces the verification cost 
dramatically. For example, an abstracted TITAC2 instruction cache subsystem 
which contains around 200 gates (modeled by over 1500 places and 1500 transi- 
tions) was verified in about 15 minutes, using less than 20 MBytes of memory 
[5]. No conventional methods can complete this verification. 

The most closely related work is probably ATACS developed at the University 
of Utah. It also uses some kind of partial order reduction [6]. However, since 
ATACS only uses restricted information relating to the concurrency between 
events, VINAS-P is expected to be faster in many cases. On the other hand, the 
expressibility of the ATACS modeling language is superior to that of VINAS-P. 

3 User Interface 

The verifier core of VINAS-P written in C is almost stable, and the graphical 
user interface written in Java is being improved. It can select circuit files and 
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specification Petri net files, and launch editors for them. The Petri net editor 
(see view in Figure 2(a)) recognizes Petri net objects such as transitions, places, 
and arcs, and is designed so that the creation or modification of Petri nets can 
be easily done. Information of the Petri net objects created by this editor is also 
used when browsing failure traces. 

When VINAS-P detects a failure state, it shows a trace which leads the 
system from the initial state to the failure state as shown in Figure 2 (This trace is 
obtained simply from the recursion stack, and thus there is no timing information 
in it). The user can select a set of wires to be shown. When this failure trace 
is displayed, the specification net is also shown, and by moving the vertical 
cursor on the failure trace, the corresponding marking of the specification net is 
displayed by an animation. This function of VINAS-P helps users tremendously 
in debugging circuits. 

For early stages of debugging circuits, VINAS-P also provides a function 
which we call “guided simulation” . It allows users to easily specify input sequen- 
ces for simulation and to browse the output traces. 

4 Future Works 

^From some experimental results [7] , we feel that the approach based on compres- 
sing and thinning out the visited state information is more effective in reducing 
the memory usage than symbolic approaches based on decision diagrams especi- 
ally for the partial order reduction technique of timed systems. We are currently 
implementing this idea in VINAS-P. 

At present, we are also designing a specification description language for 
VINAS-P in order to easily handle large specifications. Although CCS, CSP, 
and their extensions can be used for this purpose, we do not think that it is easy 
for nonexperts to use them. Instead, we have designed a Java-like language, and 
are now checking its expressibility in comparison with Petri nets. 
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1 Introduction 

XMC is a toolset for specifying and verifying concurrent systemsd Its main mode 
of verification is temporal-logic model checking [CES86], although equivalence 
checkers have also been implemented. In its current form, temporal properties 
are specified in the alternation- free fragment of the modal mu-calculus [Koz83], 
and system models are specified in XL, a value-passing language based on 
CCS [Mil89]. The core computational components of the XMC system, such 
as those for compiling the specification language, model checking, etc., are built 
on top of the XSB tabled logic-programming system [XSB99]. 

A distinguishing aspect of XMC is that model checking is carried out as query 
evaluation, by building proof trees using tabled resolution. The main advantage 
to making proof-tree construction central to XMC is the resultant flexibility 
and extensibility of the system. For example, XMC provides the foundation 
for the XMC-RT [DRS99] model checker for real-time systems, and for XMC- 
PS [RKR+00], a verification technique for parameterized systems. Secondly, it 
paves the way for building an effective and uniform interface, called the justifier, 
for debugging branching-time properties. 

The main features of the XMC system are as follows. 

— The specification language, XL, extends value-passing CCS with paramete- 
rized processes, first-class channels, logical variables and computations, and 
supports SML-like polymorphic types. 

— XL specifications are compiled into efficient automata representations using 
techniques described in [DR99]. XMC implements an efficient, local model 
checker that operates over these automata representations. The optimization 
techniques in the compiler make the model checker comparable, in terms of 
performance, to SPIN [HP96] and Murphi [Dil96] . 

— The model checker is declaratively written in under 200 lines of XSB tabled 
Prolog code [RRR+O?]. XSB’s tabled-resolution mechanism automatically 

* Research supported in part by NSF grants EIA-9705998, CCR-9711386, CCR- 
9805735, and CCR-9876242. 

^ See http://www.cs.sunysb.edu/~lmc for details on obtaining a copy of the system. 
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yields an on-the-fly, local model checker. Moreover, state representation using 
Prolog terms yields a form of data-independence [W0I86] , permitting model 
checking of certain infinite-state systems. 

— The model checker saves “lemmas”, i.e. intermediate steps in the proof of 
a property. The XMC justifier extracts a proof tree from these lemmas and 
permits the user to interactively navigate through the proof tree. 

The XMC system has been successfully used for specifying and verifying 
different protocols and algorithms such as Rether [CV95], an Ethernet-based 
protocol supporting real-time traffic; the Java meta locking algorithm [ADG+99, 
BSWOO], a low-overhead mutual exclusion algorithm used by Java threads; and 
the SET protocol [SET97], an e-commerce protocol developed for Visa/Master- 
Card. 

Below we describe the salient features of the XMC system. 

2 XL: The Specification Language 

XL is a language for specifying asynchronous concurrent systems. It inherits the 
parallel composition (written as ‘l’)> and choice operators (‘#’), the notion of 
channels, input (‘ !’) and output (‘?’) actions, and synchronization from Milner’s 
value-passing calculus. XL also has a sequential composition generalizing 
CCS’s prefix operation, and a builtin conditional construct (‘if’). XL’s support 
of parameterized processes fills the roles of CCS-style restriction and relabeling. 

Complex processes may be defined starting from the elementary actions using 
these composition operations. Process definitions may be recursive; in fact, as in 
CCS, recursion is the sole mechanism for defining iterative processes. Processes 
take zero or more parameters. Process invocations bind these parameters to 
values: data or channel names. 

Data values may be constructed out of primitive types (integers and boo- 
lean), predefined types such as lists (written as [HdiTl] and [] for empty list) 
or arbitrary user-defined (possibly recursive) types. XL provides primitives for 
manipulating arithmetic values; user-defined computation may be specified di- 
rectly in XL, or using inlined Prolog predicates. The specification of a FIFO 
channel having an unbounded buffer given in Figure 1 illustrates some of these 
features. 

Type declarations are not always necessary, as the example illustrates. XMC’s 
type-inference module automatically infers the most general types for the diffe- 
rent entities in the specification. 



3 The XMC Compiler and Model Checker 

The XMC system incorporates an optimizing compiler that translates high-level 
XL specifications into rules representing the global transition relation of the un- 
derlying automaton. The transitions can be computed from these rules in unit 
time (modulo indexing) during verification. The compiler incorporates several 
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chan(Read, Write, Buf) ::= 

receive (Read, Write, Buf) # -[Buf \== [] ; send(Read, Write, Buf)}. 

receive (Read, Write, Buf) ::= 

Read?Msg; chcUi(Read, Write, [MsgiBuf]). 

send(Read, Write, Buf) ::= 

strip_f rom_end(Buf , Msg, NBuf) ; WritelMsg; chan(Read, Write, NBuf) . 

{* ’/, Inlined Prolog code appears between braces 

strip_from_end( [X] , X, [] ) . 

strip_from_end( [X,Y| Ys] , Z, [X I Zs] ) strip_from_end( [Y I Ys] , Z, Zs).*} 
Fig. 1. Example Specification in XL 



optimizations to reduce the state space of the generated automaton. One opti- 
mization combines computation steps across boundaries of basic blocks, which 
cannot be done based on user annotations alone, and has been shown as parti- 
cularly effective [DR99]. 

The mu-calculus model checker in XMC is encoded using a predicate models 
which verifies whether a state represented by a process term models a given mo- 
dal mu-calculus formula. This predicate directly encodes the natural semantics 
of the modal mu-calculus [RRR“*"97]. The encoding reduces model checking to 
logic-program query evaluation; the goal-directed evaluation mechanism of XSB 
ensures that the resultant model checker is local. 

Various statistics regarding a model-checking run, such as the memory usage, 
may be directly obtained using primitives provided by the underlying XSB sy- 
stem. In addition, certain higher-level statistics, such as the total number of 
states in the system, are provided by the XMC system. 



4 Justifier 

Tabled resolution of logic programs proceeds by recording subgoals ( “lemmas” ) 
and their provable instances in tables. Thus, after a goal is resolved, the relevant 
parts of the proof tree can be reconstructed by inspecting the tables themselves. 
In XMC, model checking is done by resolving a query to the models predicate. 
The justifier inspects the tables after a model-checking run to create a justifica- 
tion tree: a representation of the proof tree or the set of all failed proof paths, 
depending on whether the verification succeeded or failed, respectively. 

The justification tree is usually too large for manual inspection. Hence XMC 
provides an interactive proof-tree navigator which permits the user to expand 
or truncate subtrees of the proof. Each node in the proof tree corresponds to 
computing a single-step transition or a subgoal to the models predicate; at each 
node the justifier interface shows the values of the program counters and other 
variables of each local process corresponding to the current global state. 
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5 Future Work 

Work to extend the XMC system is proceeding in several directions. First, we are 
adding a local LTL model checker to the system. Secondly, we are expanding the 
class of systems that can be verified by incorporating a model checker for real- 
time systems, XMC-RT [DRS99] built by adding a constraint library to XSB. 
Thirdly, we plan to include deductive capabilities to XMC by incorporating our 
recent work in automatically constructing induction proofs for verifying para- 
meterized systems [RKR+OO]. Finally, we are enhancing the proof-tree navigator 
by integrating message sequence charts for better system visualization. 
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